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THE. APPROXIMATE DISTRIBUTIONS OF THE MEAN AND, 
VARIANCE OF A SAMPLE OF INDEPENDENT VARIABLES 

By P. L. Hsu 

The National University of Peking 

1. Introduction. In this paper we shall study the mean and variance of a 
large number, n (a sample of size n) of mutually independent random variables: 

(1) 6,£»,•••,€«, 

having the same probability distribution represented by a (cumulative) distribu¬ 
tion function P(x). The rth moment, absolute moment, and semi-invariant of 
P(x) are denoted by a r , ft., and y r respectively. It is assumed that for a certain 
integer k > 3, 0* < oo and that a 2 > 0. Hence there is no loss of generality in 
assuming that 

(2) on = 0, a 2 = 1. 

The characteristic function corresponding to P(x) is denoted by p(t). 

We put 

(3) V = l L (fr - I) 1 

n r —1 n r —1 

(4) Fix) - PrWnl < x J, (?(x) = Pr < x \ 

l V«4 “ 1 J 

The definition of G(x) implies that a 4 < 00 and on — 1 > 0. The case on —1 = 0 
provides an easy degenerated case which will be treated separately (section 4). 
Cramer’s theorem of asymptotic expansion 1 reads as follows: 

Theorem 1. If P(x) is non-singular and if f$ k < 00 for some integer k > 3, 
then 

(5) F(x) - *(*) + ¥(x) + R(x) 
where 

(8) Hx) = 

¥ (x) is a certain linear combination of successive derivatives $ (s) (x), • • • ,4> (8< *~ 8)) (x) 
with each coefficient of the form n~* ¥ times a quantity depending only on 
k, , • • • , i (1 < v < k — 3) and 

(7) | R(x) | < Q/n‘ ( *- 2> 

where Q is a constant depending only on k and P(x). 

1 H. Cramer: Random Variables and Probability Distributions (1937), Ch. 7. This book 
will be referred to as (C). /> 4 
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In particular, putting k = 3 we get that | Fix) — $(x) | < Qri~* provided 
P(x) is non-singular and ft < ». If the condition of non-singularity of 
P(x) be removed, then LiapounofTs theorem 2 furnishes the weaker result: 
| F{x) — <£( 2 ) | < Aftn _i log n where A is a numerical constant. 

Very recently Berry 8 succeeded in removing the factor log n from LiapounofTs 
theorem under no other condition than that ft < °©. We state here Berry’s 
theorem: 

Theorem 2. // ft < °©, then 

(8) \F(x)-Hx)\<^= 

where A is a numerical constant . 

An essential step in the proof of these results is the selection of a weighting 
function w(x) and the appraisal of the integral 

(9) [ w(u){F(u + x) — $(u + x) — ^(u + x)} du 

J-00 

ss 0 when k — 3). In his book 1 Cramer proves Theorem 1 by taking w(u) = 
(— < m) w "" 1 when u < 0 and w(u) = 0 when 

(10) u > 0 (0 < a) < 1) 

and proves LiapounofTs theorem by taking 

<*» “M - vk 

On the other hand, Berry uses the following weighting function in his proof of 
Theorem 2: 

/in ^ f \ 1 — cos Tu 

(12) w(u) = - — -. 

The unfortunate selection of the function (11) accounts for the presence of the 
factor log n in LiapounofTs theorem. 

Now Cramer’s proof of Theorem 1, based on the integral (9) with w{u) defined 
in (10), makes use of a result on that integral due to M. Riesz. A more ele¬ 
mentary proof than this can be devised. In fact, one has only to use, with 
Berry, the function (12) and to adopt his elementary appraisal 4 of the integral 

* (C), Ch. 7. 

* A. C. Beret: “The accuracy of the GauBsian approximation to the sum of independent 
variates.” Trans. Amer . Math. Soc Vol. 49 (1941), pp. 122-136. This paper will be re¬ 
ferred to as (B). 

4 Berry proves the inequality (in our notation): 

dt 


1 — cos Tx 


{F(x + <0 ■— 4K*H~ a)\ dx < f 
Jo 


T (T-t) |/(0 - e ~* li 1 
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(9) in order to obtain the proof of Theorem 1. One of our purposes is therefore 
to give an elementary proof of Theorem 1, without reference to the above- 
mentioned result due to M. Riesz. Section 2 is devoted to this work. 

We ought to add that Cramer’s theorem and Berry’s theorem correspond to 
Theorems 1 and 2 for the case in which the random variables (1) do not follow 
the same distribution. The proof given in Section 2 is adaptable to these more 
general theorems when subjected to appropriate modifications; the assumption 
of a common distribution function for (1) is only made for the sake of con¬ 
venience. 

So much for the known results for the approximate distribution of {. By a 
purely formal operational method Cornish and Fisher 5 obtain terms of successive 
approximation to the distribution function of any random variable X with the 
help of its semi-invariants. It is hardly necessary to emphasize the importance 
of turning Cornish and Fisher’s formal result (asymptotic expansion without 
appraisal of the remainder) into a mathematical theorem of asymptotic expan¬ 
sion which gives the order of magnitude of the remainder. In this paper we 
achieve this for the simplest function of (1) next to £, viz. the rj in (3). We do 
not seek to remove the assumption of a common distribution for (1), as there 
will be no practical significance (e.g. in statistics) of rj if the variables (1) do not 
have the same probability distribution. Section 3 is devoted to the proof of 
the following theorems: 

Theorem 3. If a 6 < 00 and — 1 — ^ 0 (it cannot be negative), then 

a® I «<.)-*(.) I 

where A is a numerical constant. 

Theorem 4. Let P(x) be non-singular and let an < 00 for some integer k > 3. 
Then 

(14) G ( x ) = 4>(s) + x(z) + #i(z), 

where $(x) is the function (6), xW & a linear combination of the derivatives §>'(x), 

• • • , 4> (3(fc ' 3)) (a:) with each coefficient of the form n* p times a quantity depending only 
on k and a* , a*, * • • , a.n- 2 , and 


(B), p. 128. The “appraisal” mentioned here refers to (50) which is contained in B, p. 128. 
But Berry’s appraisal of the integral in the right-hand side of the above inequality is in 
default. He writes 


i fOr-*>'-*'■ * 




c )< 2 + 


-ft 


r <,/2 dt 


(B, p. 132, line 3) whilst the last integral ought to be 



.1 — c)t 2 + c — 2«$|e“ <,/2 dt. 


5 E. A. Cornish and K. A. Fisher: “Moments and cumulants in the specification of dis¬ 
tributions.” (Revue de 1’Institut International de Statistique (1937), pp. 1-14.) 
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(15) | Ri(x) I < n w=*> if k * 4, 5 or 6 

71 

(16) lWl< V(*-S +3 ) 

where Q k and Q k are constants depending only on k and P(x). 

It may be noticed that Theorem 3 is a “Berry ian” theorem about G(x), its 
characteristic feature being the absence of any condition on the distribution 
function except the two on its moments, and that Theorem 4 is a “Cramerian” 
theorem about G(x ), the characteristic feature being the assumption of non¬ 
singularity of P(x) besides that a 2k < 00 . 

In proving these theorems we have devised a method which is applicable to 
getting similar results about functions other than rj, such as functions com¬ 
monly used in applied statistics: the higher moments about the means, the 
moment ratios (e.g. K. Pearson’s hi and b 2 ), the covariance, the coefficient of 
correlation, and “Student’s 55 ^-statistic. Works on such functions are being 
done by my university colleagues, and the results will be published shortly. 

If £ is any of the random variables ( 1 ), then 

0 ^ €{a(f 2 — 1) + b£) = o 2 (a4 — 1) + 2a6as + b 2 

for all real (a, b). Hence 04 — 1 — al > 0, and <x 4 — 1 — al = 0 means that 
there is unit probability that £ assumes exactly two values. This easily degene¬ 
rated case is first eliminated in Theorem 3 by the assumption — 1 — <*l 5 * 0 
and then considered in section 4. In Theorem 4 the condition a 4 — 1 — aj! ^ 0 
is implied since £ cannot be a random variable of the nature just described owing 
to the non-singularity of P(x). 

2. Lemmas. Throughout this paper A, J5, C, etc. will denote positive numeri¬ 
cal constants; A k , B k (A km , B k m), etc., will denote positive constants depending 
only on some integer k (integers k and m), and Q k ( Q k m ) will denote a positive 
constant depending only on k (k and m) and the distribution function P(x). 

0, 0*, (0fcm), A fc (Atm) will denote respectively quantities such that | | < 1, 

| 0 | < A, |0*| < A k (| 0*m | < A km ), I A* I < Q k (I Aft. I < Qkm ). These 
symbols do not necessarily stand for the same quantity at each occurrence. 
Thus 2# = 0, fc0* = Bk etc. In particular any positive functions of k, a 8 , • • • , a k 
is a Q k . 


1.1. Cramer obtains the asymptotic expansion of the characteristic function 
of the distribution of \/n£, viz. e(e 1 *v / ^i) t when (1) do not have the same distribu¬ 
tion, valid for 1 1 1 < Q k n m . Since we assume a common distribution for (1), 


so that the characteristic function is 


asymptotic expansion valid for 1 1 \ 


we are able to derive an 


< Q k \/n. The extension to 


■(& 
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^ Vl n x 

, | presents no difficulty. This is done in the following three lemmas, 


of which Lemma 3 contains the final result. 
Lemma 1. 


(17) 


iogp(» = E^ + e t fcM*, for | f | < pi 1 *. 


Proof: Since p(t) = 1 + X = 1 + q(t) say, we have, for 


A"!\t\ < i, 


fc! 


5(0 < i ^ < i < f jj -. - 2 <!. 


Hence 

(18) 


log p(0 = z 


(_!)/« liWi^Qi^iiW+oi 


For 1 < j < [i(k — 1)] let us expand each (— 1) ,+I j” l {$j(0}' to get a polynomial 
qj(i) of degree k — 1 and a remainder r,(£). In doing this we regard q{t) formally 
as a polynomial of degree k in t. For this polynomial we have the majorating 
relation 

9(0 « A' 1 ' 1 ', 

whence 

[9(0 ) J « e’^ kul , 

J 

which gives 

(19) | r/0 | < Z ^ < /'ft j t \ k e’^ kU[ < /e y ft| 1 1‘ < -4,ft| 1 |\ 

r—Jfc I 1 


Similarly, 

(20) | 9(0 | t * a " r ‘ )l < AkPic 1 t 1 

From (18), (19), (20) we obtain 

{21) log p(t) = Qj(t) + t I*. 

Since the sum in (21) must equal the sum in (17), the Lemma is proved. 

Lemma 2. Let (fi, f 2 , • • • , f m ) be a random point with *({\) = 0 and 
«(| f* | *) = Pm < « /or some integer k > 3 (i = 1, • • • , m). Le* p(ti , • • • , t m ) 
be the characteristic function. Then for | ti | < m~ 2+l!k (3u lk \/n (i *= 1, • * • , m) 
we have 


n 



J»_\ _ yi i r Ur , 0*F* 

’ Vn/ r-«2 r! n l(r-s) n* < *“* ) 


( 22 ) 
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where U r and V r are the rth semi-dnvariant and the absolute moment respectively of 

Proof: If | U\ < then Vl ,k < m {k ~ l)l \ 20** | U I*) 17 ? < 

m (*-i )lk (2pli* | ti |) < y/n. Since p p > * * * > is va ^ ue a t t — ^7^ of 

the characteristic function of 2^f ,•, it follows from Lemma 1 that for yfn > 
VH h we have (22). 

Lemma 3. Let (fi, • • • , £ m ) be a random point with e(f ,•) = 0, €({*<) = 1 and 
«(| ft |*) = Phi < 00 for some integer k > 3. Let p„ = €(f t f,)(p« = 1; i, j = 1, 

• • • , m) and the matrix || p»,-1| be positive definite. Let 

m 

it -1 S 

(23) A = det. I pa I, ip(ti ,■■■ ,U.) - e <•'-» 

Let p(ti, • • • , t m ) be the characteristic function. Then there exists a Bkm such that 

for | L [ < (i = 1, • • • , m) we have 

Pki 

jp (>7n ’ " ' ’ Vn)} = <p ^ 1 ’ ■ ■' > 1 + t(iti, ■ ■ ■, itm .) I 

(24) 

+ \ur i + + i<,-r^)}e~ A/4mm " 12 - 

where $ (iti , • • • , itm) is a polynomial each of whose terms has the form 
r7/2 Ow l - p m (itl) 1 • • • (it m ) w , 

7v 

with l < v < k — 3, 3<pi+ • • • + v m < 3 (k — 3), and a, v .. Pm depending only 
on k and the moments c(f![ l • • • f£"), 3 < m + • • • + y m < k — 1. If k — 3, 
then ^ = 0. 

Proof. If | U | < m~ 2+{llk) 0k? ,k & Vn, then | U | < m~ 2+(m Pkl lk y/n since 
A < 1 and fax > 1. It follows from Lemma 2 and the fact J7 2 = 2 paUtj that 


(25) 

K^’ ■ 

’ Vn) 

II 

r 

V-' 

CT> 

m 



^.•••,o{l+g i! + _1 2) I } 

where 




(26) 


8 = 

W , 0*7* 

(r + 8) !n r/ * + n 1 ^ ‘ 
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Regarding s formally as a polynomial in let us expand each (j\y 1 8 i (1 < 
j < k — 3) to get a polynomial of degree k — 3 in ri~* and a remainder r,«. 
For the formal polynomial s we have the majorating relation 


whence 


which gives 


‘f?ryv. 

V"nr-or!n f/2 \/wm r!n r/2 \/n 3 


Is’ «A k \e iv > n ~\ 

j! n il2 


' 1 - n»« v!n" 2 - n‘«"« 


Since V]! k n * < 1 as shown in the proof of Lemma 2, we have 

A v (k-2+ij)ik Akm(£ Pki\ti\ k ) {L - 2+23),k 


nl(*-2) - 


AuZti'Slkl) 


,1/fci, |\*-* + */ 


n i(fc~2) 


Since ft* > 1 we have < ^*~ 2)/ *. Hence 


Similarly 


a V a % (*-*)/* | * |3(*-2) 

]Jt—2 Z-r P*i U 

l c 1 < ‘ 

(A; — 2)! — n* ( *- 2) 


From (25), (28), (29) we get 

K4 • - •- $}f - * ■ 4 + § - + § r - + <* S'" 1 

= <p(h , tm) {1 + , * * •, #*)} 

+ Jfe w“-“<w‘+i<.r + ••• + ikir*)! *«.. •••. o» ul 

where f(tti , • • • , it m ) stands for Ss,-. The assertion about , • • • , it m ) 
announced in the lemma can now be seen without difficulty. It remains to show 
that with suitable B km in the lemma, we have 

, . -A/4m*-l S l* 

Ah, < e 
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i«e. 

(30) 

From (27) we have 


\ . 5 , ***** + 1*1 - ~Z^=i 5 *’ • 


4 m m 


(31) 




8 /* 




If we choose B km < (4 m!*~ l Akm)~ l (and B km < mT 2+{llk) in order that the earlier 
results may not be affected), the A km here coinciding with the last written A km 
in (31), we have, for ] U | < B km ^ ,k A\^n y 

« '*i - s^=«S‘‘ • 

On the other hand, if Xi, X 2 , • • • , X m are the latent roots of || p„ || then each 
X< < m since their sum is m. Letting Xi be the smallest one we have 


(33) 


i 5 «<.a 1 x. Z 6 -E « > 5 ^ £ 6 


2 m m 


(32) and (33) imply (30). Hence the lemma is proved. 

Let us write down the particular cases m — \ and m = 2 of (24): 


(34) 




(35) 


«»<*-» 


ni<*- 


- e -, ‘*(l + iiit)) 

^'•jur+ur + 


+ \ t 


[ (h tt \\" . .... 

MvS’Vs))"* U+Krf.,*)} 

b> {£ a^i r + 1 r + ••• + !<.• r-»)}e ~ (W ’ ,( ' X)/8 
(im< . p = ‘(f>f*))- 


More specially let us rewrite (34) and (35) with k — 3: 

/ / <i <* \\* -»(i 1 , +<;+2 P < 1 (.) 

Hv^)/ =e 


( 37 ) 
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In this paper only these last four formulae are needed; they are used in the 
proofs of Theorems 2, 1, 3, 4 respectively. Cases of m > 2 of (24) will be 
needed for the works on other functions alluded to in the introduction. 

1 . 2 . In the following group of lemmas, which culminate in Lemma 7, one 
finds a generalisation of the Riemann-Lebesgue theorem, vis. Lemma 6. 

Lemma 4, Let f(x) be a polynomial of degree m > 0, with real coefficients: 


(38) /(*) = i; a i x m ~ < (ao*0) 

Then 

(38) ^ e <m dx 

Jo 


Proof: It is sufficient to prove the inequality for 



Divide 


the interval into A m sub-intervals in each of whose interior none of the deriva¬ 
tives / <0 (x) (t = 1 , • • • , m) vanishes. It is sufficient to consider one of these 
sub-intervals, say (a, b). Consequently each of the polynomials / (0 (x) are 
monotonic in (a, h). Let 


(39) 


I = 


f 

Ja 


cos/(:r) dx. 


Suppose first that f'(x) is positive and increasing for a < x < b. Then 
i r I ^ . _l I /* /'(*) cos f(x) dx 

1 |S "H-L. /'(*) 


* + f '(a+t) | i.+/ ^ 008 dx (« + < < hi < b). 


by the second mean-value theorem. Hence 


(40) 


m<* + 


2 

/'(a + c) * 


NowO </'(a + *c) = /'(a + «) - e/"(a + 0c)/2, J < 0 < 1. Hence /'(a + 
c) > Jc/"(a + 0c). Since /"(x) is monotonic, we have either/'(a + c) > 

(o + c) or/'(a + c) > i^f"(a + Je). In other words, there exists a constant Cj, 
independent of a or c, such that £ < C* < 1 and/'(o + «) > £«/(a + C*c). 

If /'"(x) > 0, we have, as before f"(a + C 2 c) > iCjcf'"(a + C 8 c), where Ci 
is independent of a or c and J < C* < 1. If /"'(x) < 0, then, since 
0 < /"(a + 2CW - /"(a + C 2 e) + C 2 c/'"(a + 0Ac), * < 0i < 1, we have 
/"(a + C 2 c) > — C 2 c/'"(a + 20iCac). As /'"(x) is monotonic, either /"(a + 
Cjc) > -Ctc/'"(a + CW or/"(a + C 2 c) > -Ct«T'(a + 2(7,«). In ail cases 
we obtain /"(a + C 2 «) > | /"'(a + C 8 c) |, where B% and C 8 are independent 

of a or c, and J < C 8 < 2. Hence /'(a + c) > i£ 8 c* | /'"(o + C,c) |. Arguing 
with =fcf"'(a + C 8 c) as we did with/"(a + C,c), and so on until we come to/ (m> , 
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we obtain/'(a + e) > B m € m ~ l |/ (m) (a + C m c) | = 1 ao |. Substituting 

in (40) and putting e = | ao |“ 1/m we obtain | / | < A m | ao \~ llm . The proof 
presupposes that C m e < 6 — a. If the reverse inequality is true, then | / | < 
b — a < | ao |~ 1/m . Hence the lemma is true for f(x) positive and increas¬ 

ing in (a, b). 


r a 

cos (— f(b — y)) dy, 

~f(b — y) being a polynomial with the leading coefficient =ta 0 and the first 
derivative /'(b — y), which is positive and increasing. This case reduces there¬ 
fore to the preceding one. Finally, if f'(x) is negative, we have only to notice 

that I = / cos (—/(ar)) dz. Hence the lemma is proved. 

•'ll 


Lemma 5. Letf(x) be the polynomial (38a), and let a r y* 0 for some r, 0 < r < m. 
Then 


(4i) I £‘" Ud ‘ I*,.. 

Proof: We may assume that \a r \ > 1, (41) being trivial if | a r | < 1. If 
r = 0 this reduces to Lemma 4. Suppose that the lemma is true for a 0 , ai, 
• • • , a r -x. Let }\{x) = a 0 x m + • ■ • + a r _ fi(x) = f(x) - /i(x) and 
divide (0, 1) into A m sub-intervals in each of which j\{x) is monotonic. It is 
sufficient to consider one of these sub-intervals, say, (a, b). We have 

I - f cos |/i(x) +/ 2 (a:)) dx 

= f cos/i(x) cosf 2 (x) dx — f sinfi(x)smf 2 (x)dx. 


We have only to consider the integral of cosines, say J. Divide (a, b) into sub¬ 
intervals in each of whose interior cos fi(x ) is monotonic and does not vanish. 
The number of such intervals does not exceed (^7r) -1 | fi(b) — fi(a) | < 

I + |/i(o) |) < 2(| ao | + • • • + | dr-i |). Then, by the second 
mean-value theorem, 

J ' r&i I 

cos / 2 (x) dx I (a < bi < b). 

a I 

Hence, applying Lemma 4 to }i(x), we get 

/AO\ I rl ^ ^»(|oo| + ••• + | Cr-l | ) ^ A m ( | On | + • • • + | Or-l |) 

W 1 J I - | Or - ‘ I Or | 1,m • 

On the hypothesis of induction we have 1 1 \ < A m \ ai p®" (t = 0, • • • , r — 1). 
If | o< | > | a, \ llim for some i < r, then | l \ < A n \a, j - -®"'*™ ; if | a< | < 
| a, then by (42), 1 1 \ < A m \a r p 1,Sm . The proof is therefore complete. 



DISTRIBUTIONS OF MEAN AND VARIANCE 


11 


Lemma 6. Letf(x) be the polynomial (38a) and g(x) be summable over (— »). 


T/ien /or every 

r toe /lave 



<43) 

lim f e if{z) g{x) dx = 

0, uniformly in a,(t 

3 ^: 

1 

O r I —*00 J-ao 



Proof: By! 

Lemma 5 We have 




lim f * e’ 7<r> = 0, 

uniformly in a<(t 5 ^ 

r). 





Hence 




<44) 

lim f e t/(x) dx = 0, 

|a r |-*oo •'o 

uniformly in a*(t ^ 

r) 


for if a 7 * 0 and b 5 * 0, then (a, b) is the sum or the difference of two intervals of 
the form (0, c) or (c, 0), and for the latter intervals the transformation x = dtcy 
reduces the interval of integration to (0, 1). 

Let G be any open set of finite measure. Then G is the sum of a sequence 
{/,} of non-overlapping intervals. Since 2ra/ r = mG < », we have 

X) ml* < c, n > N. 

* 2 . n 

Hence 

e t,w dx <* + £[/ e </w dx 
a k-i I Ji, 

•which, together with (44), implies 

(45) lim f e t/(x) dx = 0 uniformly in a,(i ^ r). 

Let $ be any set of finite measure. Then there is an open set G such that (?D S 
and m(G — S) < e. Hence 

I f e ’ 7(f) dr < € + I [ e iM dx . 

| Js I Jo 

Hence, by (45), 

(46) lim [ e tfix) dx = 0 uniformly in o»(i 9 * r). 

| a ,| —*00 *S 

Now let h(x ) be any positive “simple” summable function, i.e. h(x) — a, > 0 
for x € S (v = 1, 2, • • • , n) and h(x) = 0 otherwise. Since /&(r) is summable, 
each must be of finite measure. Hence 

I f e t/(x) h(x) dx < £ Or I f e inx) dx 

I •*—00 thml I JS, 

which, together with (46), implies 

lim / e' fix) h(x) dx — 0 uniformly in a<(t 5 ^ r). 

|a r | -*oo J-oe 
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Finally, let g(x) be any summable function > 0. Then by a well-known theo¬ 
rem 6 we have g(x) = lim h n (x), where {h n (x) j is an ascending sequence of positive 
summable simple functions. Hence 

[ f e ifW g{x) dx \ < I f e ifis) hn(x) dx \ + f (g(x ) - hn(x)) dx . 

I •k-OO I I ^““00 I j—ao 


By monotonic convergence the last integral tends to 0 as n —► «. Hence 
I f e ' Hr) d( x ) dx <« + (/" e </M h n (x) dx , 

| J—co I J~ao 


which implies (43). If g(x) is any summable function, we have only to consider 
the customary expression of g(x) as the difference of two non-negative functions. 
This completes the proof. 

Lemma 7. Let P(x) be a non-singular distribution function of a random variable 
X , and let 


r 00 i s 

(47) p(h , • • • , O = / e r " 1 dP. 


Then for every r and every positive constant c we have 
(48) l.u.b. |p(<i, • • • f t»)| < 1. 

I *r\*c 

Proof: We have P(x) = aiPi(z) + (hPi{x), where Pi (a:) is absolutely con¬ 
tinuous, Pt is singular, ai > 0, ai + 02 = 1. Hence 


I /• 00 f 2 t r X r 

| p(h,k, •••,<„) I < Oi|£ e — 1 P[{x) dx +Oj. 

By Lemma 6 we may find C > 0 such that 

| p{ti , k ,•••,*«) | < iai + Oi < 1, if any | L | > C. 
Suppose that 

l.U.b. p(ti, • * • , t m ) * 1, 

l*r| at® 


then c < C and we must have 

(49) l.u.b. |p(<x, •••, Ol = I- 

cZ\t r \$C,\til£C(if*r) 

Since p(h , • • • , t m ) is a continuous function, it must attain its least upper bound 
in any bounded closed set. It follows that there is a point (#,•••, &) such 
that 7 ^ 0 (| t° r | > c) and p(t\ ,•••,&) = 1. But this implies that the 
distribution of 2t\X' is discrete, i.e. that the distribution of X itself is discrete. 


• H. Kestelman: Modern Theories of Integration (1937), p. 108. 
7 Cf. (C), p. 26. 



DISTRIBUTIONS OF MEAN AND VARIANCE 


13 


which contradicts the non-singularity of P(x). Hence (49) is false and (48) is 
true. 

1*3. In his cited work Berry 8 shows that if F(x) is any distribution function 
and if $(x) is the function ( 6 ), then there is a constant a such that 

|j[ - 1 ~~ ^ Tx [F{x + a) - <t>(s + a)) dx 

where 8 = vi l.u.b. | F(x) — <£(x) |. This is easily extended to the following 
lemma, which needs no further proof. 

Lemma 8. Let F(x) be a distribution function and Fi(x) be a function having 
the following 'properties : (i) Fi(x) is bounded for all x, (ii) Fi(x) —* l as x -+ oo f 
Fi(x) —* 0 as x — oo , (iii) Fi(x) has a bounded derivative , | F[{x) | < M. Let 

& — 2 ^- 1-u-b. | F{x) - F i(x) |. 

Then there exists a constant a such that 

{ F(x + a) - F,(x + a)} dx 

X* 

> 2 MTS |3 jP 1 * dx - x|. 

1.4. In section 3 we define, for given €, k, X and z, a function 

(52) G(x f y) = e~ tv%k if z < x < z + \y 2 , G(x t y) = 0 otherwise. 

The introduction of G(x, y) and the appraisal of its Fourier transform constitute 
the essence of our method of solving the problem of the asymptotic expansion 
of the distribution function G(x). The solution of the same problem about 
other functions of (1) alluded to in section 3 is based on the introduction of 
functions playing the role of G(x } y). We now prove the following lemma: 
Lemma 9. Let G(x, y) be defined by (52) and let 

(53) f/(fa, fa) = P f G(x, y) dx dy. 

J- oo 00 

Then 

(i) |<Kfa,fa)| 

(u) |?(fa,fa)| < + ~^) if *=3, 

(iii) | g(h, fa) | < j^y 2 ^73 + . 



• (B), p. 128. 
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Proof: 

(i) I g(ti , fa) | < f G(x, y) dxdy = X [ 2 /V' v ” dy = ^ 

*^2 oO * 

(ii) Putting & = 3 we have 

f(«i, fe) = ~ - <r“ XvJ ) dy, 

| y(<i, < 2 ) | < | £ u(y)v"'(y) dy , 

where u(y) = e“ i|,# (l — t;( 2 /) = e~* <81 '. On integrating by parts we 

obtain 

(64) | g{U ,U)\< i^, | £ v(yW"(y) dy | < £ | u>"{y) \ dy. 

Elementary calculation establishes that 

I—I < e~‘ l ''(216X« s |y| ,7 + 756Xe 2 | y \ 

I h I 

+ 336Xe | y | 6 + 8 X 8 1 <i |* | y |* + 12X 2 1 h 11 y |). 

Substituting in (54) and making the transformation y = C w x we get the result. 

(iii) We have 

I g(tr ,fe)|< t -i| £ «-» 2k -«*»(l - e*""') dy 
Integrating by parts twice we obtain 

i" 0 - •«‘ ^ naW £ ,e "'” a - I ^ 

By elementary calculations we get 

| g(ti, U) I < |£ 2 £ (4fcW* + 2k(k + 3 )X«y“ + 4X 2 |<i| y 2 + 2X)e" ,, '“ dy 

which, on the transformation y = e' ll2k x, gives the result. 

1.5. We prove a few additional lemmas used in the proof of Theorems 3 and 4. 
Lemma 9 10. Let u(x i, • • • , x m ) > 0 be summdble in the m-dimensional space 
and let 

(55) »(<i> ••'»*») “ [ • f e~“ lX ' “ mXm u(xi, • • •, x m ) dx t ■ ■ • dx„. 

J—oc •*—oo 

9 Although the author believes that this lemma is almost classical, a proof is given owing 
to lack of reference. 
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If v(ti , • • • , t m ) is 8ummdble in the m-dimensional space , then 

(56) «(* dU. 

Proof: Except for a constant factor the function u(xi , • • • , x m ) may be 
regarded as a probability density function. Hence by the well-known inversion 
formula of (55), 


J ' * * J* u(xi , • • *, x m ) dx i • • • dx m 

(57) a i^ x i^i (*-l, •••.*») 

1 f 00 f 00 e Uihi — e itia *\ 

= WrL L\U itj ) v(tl ’ 


Now w(ii, • • • , £ m ) is almost everywhere the symmetric derivative of the inter¬ 
val function in the left-hand side of (57): 


u(x i, 


Hence 


■ j Xm) — lim 


1 


e -*0 (2e) n 


/•••/ u( Vl , ■ 


• • > Vm) dyi ■■■ dy m . 

•,m) 


(58) 


u(xi 


> ^m) 


1 l im -L 

(2*-) m 1™ (2e)” 


r-r 

•*—00 •*-« 






U») dil ••• dim. 


Owing to dominated convergence the order of the limit sign and the integration 
sign in (58) may be inverted: Hence (56) is true. 

Lemma 11. We have 


- c °s Tu du = Mr - |*|) 


*7 < T, 
if > T. 


<»> 

Proof: The Fourier transform of the function in the right-hand side of (59) is 
7T f T e“ u (T - 1 1 1) dl = ^ (1 - cos Tu). 

J—T U 


Hence (59) follows from (56). 
Lemma 12. 


(60) | e(£i + • • • + £n)* | 5s Akn k,2 f$k 

Proof. As (60) is true for k = 1, let us assume, for induction, that it is true 
for 1, 2, • • • , k. Then, by symmetry, 

«(£i + • • • + £n)* +1 = + + £n) fc } = n 23 (^j «(£i +I ^*~ r ) 
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where U * & + • • • + £ *. Since e(f i) = 0, we have 

«&+••• + fn )‘ +1 = n ± (J) e«I +1 I7*- r ). 

On the hypotheses of induction we have | «(£/*~ r ) | < Ak(n — l)* < ^ r) /S*- r < 
Ahn* {k ~ l) p k -r . Hence 

|«(fi + • • • + £»)* +1 1 < AjLlifcn^^SiSr+iiSifc-r < Ak+in^ k+1) Ph+i • 

Therefore the induction is complete. 


3. Elementary Proof of Theorem 1. 2.1 We have defined 


(61) F(x) = Pr{y/nl < x), $(x) 

with the characteristic functions 

m )}'. 


4 = r 

/2lT i-oe 


^ V 1 


\/2ir . 

<p{t) = e-‘“. 


dy 


Following Berry 10 we use the equation 

(63) [“ \F(x) - $(s)}e“* dx = ^ ~ - ./ (<) . 

Let be the polynomial in (34), and let us define >k(x) as the function ob¬ 
tained from through the replacement of each power {it) v by (—l)”# 00 (x). 

,00 

Integration by parts shows ( — l)’’ -1 / e ltx $ M (x) dx = (#) r ~V($), whence 

j—00 


(64) 


f" *(x)e i,: 

J— oo 


dx 


-it ' 


From (63) and (64) we obtain 


( 66 ) 


f { F(x ) - *(x) - ¥(x))e“ r dx = ^- »(<){! + *(»<)} _ 

J-oo — it 


The function ^(x) defined here is precisely the \k(x) appearing in (5) under 
Theorem 1. Our task is to prove that 

Qk 


( 66 ) 


| F(x) — <f>(x) — tf(x) | < 


,(*-2)/2 • 


Following Berry 11 we replace x by x + a in (65), getting 

f {F(x + a) — $(x + a) — ^(x + a)\e itx dx 
j— 00 

_ e^'im - »«){i + won 


(67) 


10 (B), p. 127, Equation (23). 
» (B), p. 127. 
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multiply both sides of (67) by T — 1 1 1 and integrate with respect to t in (— T, T): 
— cos Tx 


p- 


s 2 


\F(x + a) — $(z + a) — ¥(x + a )) dx 

-i 


(T-llpe^Vd) - »«){! +*(t<)H 
—it 


dt 


the reversion of order of integration involved is obviously justifiable. Hence 
— cos Tx 


( 68 ) 


[ l -£25 1* {F(x + a) - *(« + a) - ¥(x + a)) dx 

A- ao X* 


- r i 


1/(0 - »(<){! + Hit)} 


dt. 


2.2. When in particular A; = 3, (68) becomes 


(69) 


P- 

•*-00 


— cos Tx 


{ F(x + a) — $(z + a)} dx 


* r i 


1/(0 - »( 0 l 


dt. 


If we choose a to be the a in (50), the left-hand side of (69) is not less than 
/j/| Ts{sf l^ dx - *}, i = / |/|l.u.b.|F(x) - #(*) 1. 


On the other hand, taking T 
not greater than 


^{ n as in (36) the right-hand side of (69) is 
ft 


A j£" t i e~ i,t dt = A. 

Hence 

Now the left-hand side of (70), as a function of T6 , is positive and increasing for 
sufficiently large T6, and becomes infinite as Td —> «. Hence (70) implies that 
T6 < A 9 i.e. 


l.u.b.|F(x)-*(x)| <| 


Ah 
y/n 1 


giving Theorem 2. 


2.3. Coming back to the general case, we see that the function $(z) + ^Gc) 
has a bounded derivative: | $'( 2 ) + ^'(z) | < Q k , and also has all the properties 
of the function Fi(x) in Lemma 8. On choosing a in (69) to be the a in (51) 
we obtain 
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(71) Q k n 

where 


| 8 j- iT £ \m - >w;»+ m i 


« = Q k Lu.b. | F(x) - <t>(x) - ¥(x) |. 

Let us take T = (A k ^ lk Vn) k ~ i with A k in accordance with (34). Then 


(72) 


T f 
Jo 


'\m - non + Hit)}! 


dt 

ri/(*-a) 


= Q k n M ~ t> f + Q k n" k -» f , - J k + J-. 

Jo j Q k v» 

By (34) we have 

(73) Ji < Q k f (<* -1 + • • • + < tt " 7 )e _1 “ dt = Q k . 

Jo 

Also, 


2 say. 


(74) J 2 < Q t n 4< ‘- S) f 

JQ kV* 


lp(t/y / n)|’ > 


f 


dt 


+ Q k n Uk ~ 2) [ T ^ 


<K0 11 + *(it) | 


t 




The second term in the right-hand side of (74) is evidently <Q k . The first 
term does not exceed 


(75) 


T l.u.b. | p(t) | n . 

t*Qk 


At this step we make use of the non-singularity of P(x) and apply Lemma 7 
for m = 1. We have 

l.u.b. | p(t) | = e~ Qk . 

t^Qk 

Hence (75) does not exceed Q k n h2k ~ b) e~ Qkn < Q k . We have therefore 


(76) 


TB<Z 


f Tt l 

Jo 


— cos X 


dx - tt} < Q k , Q k n Uk ~ 2 \ 


Arguing with (76) as we did with (70) we conclude that 

l u.b. | F(x) - Hz) - Hx) I < f ■ 

(72) is valid for T > 1. If T < 1, we have only to suppress the term «/ 2 . Hence 
Theorem 1 is proved. 

4. Proof of Theorem 3 and Theorem 4. 3.1. In connection with the random 
variables (1), we assume that ft* < °° for some integer k > 3 and define 


(77) l = I) 2 , 0(z) 

n r -1 


- Prl^t =J>< 

l Vo4 ~ 1 



Now, 

where 

(78) 
Hence 

(79) 
with 

(80) 
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y/n y/oti — 1 
G(z) = Pr{X - \Y 2 < z\ 
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Let W be the probability function of the distribution of the random point 
(X, Y) and f(h , < 2 ) be the characteristic function: 

(81) 1F(S) = Pr j (A', F)e.S} for every Borel set S in R 2 , 

(82) 

(83) p(k, l 2 ) = f° dP. 

Let Gi(z) be the distribution function of X. Then 

(84) G(z) - Gi(z) = / j dW= K(z), say. 

Let 


*<*£*+Xy* 


(85) K,{z) = J j e-»“ dW. 

t<x<, *+Xy* 

If we define (for fixed 2 ) the function G(x, y) by 

(86) G(x, y) = e~"'“ if z < x < z + \y 2 , G(x, y) - 0 otherwise, 
then 


(87) 
Letting 

( 88 ) 


K.(z) = r r G(x, y) dW. 

J— 00 J— 00 

r r ^‘ l ‘-“' y G(x, y) dx dy = g(h, U), 

J— so J—00 
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we replace x by x — u in the integral and get 

(89) r r e~ itl *~* t,v Q(x - u, y) dx dy - e^gih, k). 

J—OO j— to 


1 *** cos Tu 

Multiplying both sides by--- and integrating with respect to u we 

vr 

obtain, with the help of (59), Lemma 11, 

f f d x dy f -- - ° 8 - ,T u Q( x — Uf y) du 

«*- 00 J—00 j—to U* 

(90) 

_UT- | k\)g(k,k) if k\ < T, 
10 if > T; 


the reversion of order of integration in the left-hand side is obviously justifiable. 
By Lemma 9 the right-hand side of (90) is summable in the whole plane of 
($i, k). Hence, by Lemma 10, 


(91) 


1 — COS Tu V , 

-- 0(x - u , y) du 

00 U z 

= - l - f J (T - \h\)g(h,t t )e <, '* +i, ' y cUidi i . 

\tl\ZT 


If we integrate both sides with respect to the probability function W, we obtain, 
on reversing the order of integration, 

£ l - du j J G(x — u, y) dW 

~ J J (T — 1 1\\)g(ti , , tv) dt\dt2 . 

I«ll *T 


(92) 


By (86) and (87), 

(93) f" I" G(x - u, y) dW - K.(u + z). 

J—a0 **—c 0 

Hence 


/ 00 1 — cos Tu v , . v , 

„—— K,( - u + ^ du = 

We now take the functions 

(95) Ah , k) = 


5 / / (r - \h\)g(h,k)f(h,k)dkdk. 

\ti\ZT 

-i(tJ+*5+2pt 1 < 2 ) 



DISTRIBUTIONS OF MEAN AND VARIANCE 2L 

and yp(iti , it*) as in (35), where 

<«> '-££ 3 — 

Since the condition 04 — l — aJ^Ois assumed in Theorem 3 and implied in 
Theorem 4, we have | p | < 1. Let 

(97) w(x, y) = 2 ^/T =7 

and let y(x, y) be the function obtained from yp(it \, it 2 ) through the replacement 
of each power (it,)’ 1 (it,)’* by y) = (-1 . 


w(x, y) = -L £ £ , k) dhdh, 


we have 


(99) w, lti (x, y) = ( £ £ (tti) ,1 (t'« S )"e" ,,,w<l V« 1 , k) dt l dh , 

whence, by Fourier inversion, 

(100) (i<i) M (tfe)'V(«» ,k)=f f e“ lX+< ‘* y v). iri (x, y) dxdy. 

■*—00 •*—00 

From the definition of y(x, y) it follows therefore 

(101) f f e i,lX+i, * v {to(a;, y) + y(x, y)\dxdy - <p(h , 1»){1 + *(it,, its)). 

J—oo J-eo 

A comparison of (101) with f f e til *+ tt2V dW = f(t %, ^) shows that (94) will 

J-0O J—UO 

remain true if K t {u) be replaced by 

(102) J J e~' yik (w(x t y) + y(x, y)) dxdy = L,{u), say, 

u<*<:«+Xy* 

and /(<!, t 2 ) be replaced by <p(ti , t*) {1 + , #*)}. Hence 

f m 1 \K.(u + z) - L,(u + z )) du 

J-<* Vr 

(103) J f (T 


~ i?(ti , fe)[ 1 + iKtfo, t£j)]} dlidti ■ 
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Let also 

(104) H(z) = J J {w(x, y) + y(x, y)}dxdy, 

*—Xy l £z 

Hi(z) = J J {w(x, y) + y(x, y)) dxdy, 

x£t 

(105) L(z) = H{z) - Hi(z) = J J {w(x, y) + y(x, y)} dxdy. 

KiSi+Xv* 

3.2* We now consider the particular case k = 3 and prove Theorem 3. For 
k = 3 we have ^==7 = 0 and so 


(106) 


H(z) = J J w(x, y) dx dy, 

*—x» s <; * 

Hi(z) = J J w(x, y) dxdy = $(z), 
L(z) = H(z) - Hi(z) j 


(107) 


(108) 


L.(z) = J J e ' yt w{x, y) dx dy, 

*<*^*+Xv a 

f \ - £21 Tu {x,(m + x) — L,(u + x) j du . 

J -00 IT 

= 4 lir 11 ^ I |)p(^i > ^){/0i> fe) 9^1 > ti)\dt\dti . 


Now 

- L.(u) = {( 7 (a) - <P(u)} - {//(u) - #(ti)} - lGi(u) - *(u)} 

- {K(tt) - 2Sr.(u)} + {L(u) - L.(tt)} f 


1 /• 00 /»u+Xy* 

0 < HM - *(«) - f. % /. 

a j. vV "" * ' v &< r- >) ■ 


4 r — 1 J /y< 

|G,(«)-*(«)| <^l a dP ^(a 4 l 1 )*Vn byThe0rem2> 

0 < ^l(w) — < €€(7®) < Aa ec by Lemma 12, 

0 < L(u) — L,(m) < 
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Hence 


/ -- ;” — |G(w + x) — $>(w + X)J du 

J-ao Vr 


(109) " eT f" + («, - 7)"Wn + V~n V («4 - 

+ 0 T J J | g(h , < 2 ) M/(fa , U) — <p(l 1 , h) | dtidti . 


It is easy to verify that 
\ 8/2 


+ 


Mil**’ 


1 


(«4 - ir 21 v(o4 - m - ?) 


(—f-,r. 

\a 4 — 1 — as/ 


For the left-hand side of (109) we refer to (50) and take x to be the number a 
therein. Hence 

4 /"^* -'} s "f* + 

(110) + AT J J | g(h , k) M/0i , k) — <p(ti , fe) | dtidk 

+ AT J J | g(ti , £ 2 ) | dtidk . 


I<i|sr f |i*|2sr 


|<l|:£r.|«al>r 


Bv Lemma 9 (ii) we have 

T J J \g(t 1 ,t 2 )\dt 1 dt 2 


\tl\£T,\t 7 \>T 

(111) < AT 


II 


\ti\<.T,\t%\>T 




Hence 


TS 


( 112 ) 


< ^ r. + (—■_■ o .) ^j + » + A£ + ^] 
+ AT J J I g(ti , $ 2 ) 11/ — <P | dh dti . 


Mil £r,|« a | sir 
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By Lemma 9 (i) with k = 3 we have 

(H3) T // \a\ -I/- v\dtidt t <^± ff [f-^dhdh. 


By (37) under Lemma 3, 


\ti\ZT,\ti\ZT 


(114) I/- ^1 <^(^l^l* + ^UIV‘ <1 -' ,K,?+,5> forl<,l< A(1 ~f )V ^ 


hi 


with 


(115) 


Ai = L 


X 2 - 1 
\Z<*4 ~ 1 


' dp *(^h)<Ly +,>iF 


Wfe now take 

<“»> r - i (°‘- r -y -)' v "’ 

the A coinciding with that in (114). Then 

. 4(1 — p*)Vn v,. A(1 - p^jou — l) i y / n 

(117) 


Ai 8cu 

A {on — 1 — a|)Va« — 1 Vn > A (on — 1 — al) t \/n _ 


(118) 


8a, 

A(1 — p*)\/n _ A(a« — 1 — al)\/n 

& (a, - 1)A 


8a!« 


= r 


A (a, — 1 — ai)*\/n A(a« — 1 — a»)* \A 


»/s a — *n 1 ' 

a, p« a. 


Hence (114) is true for | ti \ < T and | U \ < T. Using this fact on (113) we 
obtain 


( 119 ) 


t II \a\\f ~ vldhdk 

\ti\ST,\t t \£T 

s T & £ £ {(—IT. 1‘ir+ftw} A * 

_ ATX / at . _\ 1 

^ <V^V(<* - D 8 ' 1 + (1 - p’) 6/i 

■4 t^x 1 

“ M * 4 ~ + ft( “* _ 1)6 ' S) (04 - 1 - alf 2 


AJ£ _ l 

Wt (a,Va< - 1 + ft(a4 ” 1),} (04 - 1 - a s ,) H * 


< 


nit 


AT *i 

r»V«(«4 - 1 - a 2 ,) 6 ' 8- 
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Substituting in (112), setting « = (a%T) 1 and using (116) we obtain after some 
easy reduction 


TS 


( 120 ) 


' A V»(04 — 1) (»(«* — 1 — a!)) (nfa — 1 — 


If n > (04 — 1 — aj) ^, then the right-hand side of (120) is < A, and so, 
arguing with (120), as we did with (70), we obtain 


(.2D i.u-b. 1 o<«) - *<«> I s f - (- 

For n < (04 — 1 — a») _1 aj, however, the rightrhand side of (121) > A{cu — 
1 — aj) -1 ae > A and ( 121 ) becomes a triviality. Hence Theorem 3 is proved. 
3 . 3 . To prove Theorem 4, we start again with the identity (103). We have 

K.(u) - L,(u) = |G(u) - H(u)\ - ((?.(«) - H,(u)} 

( 122 ) 

- \K(u) - K,(u)} + \L(u) - L.(u)}, 

(123) 0 < K(u) - K,{u) < «(F 2 *) < Q„ f by Lemma 12, 

(124) 0 < L(u) - L,(u) < e f f y u (w(x, y ) + | y(x, y)\) dxdy < Q k t. 

J—00 »-« 


Let us show that 

(126) | (?,(«) - Hi(«) | < Q k /n' ik ~ l \ 

l n /^i — 1\ - 

The function X = ^ 7 ^ 2 ( )has the same structure as y/n \ (with 

(oca — 1)“*(£< — 1) playing the role of £*); hence, by Theorem 1 , there exists 
an asymptotic expansion of the distribution function Gi(u). We shall see that 
the terms of this asymptotic expansion are precisely Hi(u), whence (126) follows 
from Theorem 1 . 

It is obvious that for the polynomial yp{iti , ife) in (35) $(it, 0) coincides with 
the polynomial \fr(it) in (34). Hence the terms of the asymptotic expansion 
of Gi(u) are the inversion of e~*‘* {1 + \p{it, 0)) viz. 


(126) 


m + ±[>r e ~’‘* H 'V(t<. o) dt. 


On the other hand, by (104), 

( 127 ) Hi(u) = 4 >(u) + j_ K d* j_ m y( x > v ) 


and by ( 101 ) with U = 0 , 

(128) f e i,x dx f y(x, y) dy — 0 ). 
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Inversion of (118) gives 


(129) 


£ y( x ’ v)dy = l £ e- , ‘ lH< V(t<, 0) dt 


which establishes the equality of H\(u) and (126). 
Using (122), (123), (124), (125) on (103) we get 

cos Tu 


(130) 


£ 1 m* + *)" + z)} du = 

+ 0 T J J | g(ti , tz) | • \f(h , £ 2 ) “ <p(ti, U)[ 1 + , U 2 )] | dtidh . 


If wc expand 

(131) H(u) = J J {w(x, y) + y(x,y)\ dxdy 

in powers of rT* up to and including the term n“ i( *“ 3) , the remainder is obviously 
A*rf l< *- 2) . Hence 

(132) H{u) = *(u) + xW + A*/n* ( *~ 2) , 

where $(w) + xO-O is the group of terms of the Taylor expansion of (131) in 
powers of n -i up to and including the term From (130) and (132) we get 

-- ^ — (G(u + z) — $(•« + z) — x(« + z)) dw 

<^(e + ^) + .4/, 

where 

(134) I = T J J | g{t \, k) \ , £ 2 ) — <p(h , fe){l + , it*)} | dt\dt» . 

\tl\ZT 

We are going to prove that the function x( u ) here defined satisfies all the 
requirements of the function x(m) m Theorem 4. The structure of xM an¬ 
nounced in Theorem 4 is easily verifiable. It remains to prove the inequalities 
(15) and (16) satisfied by 

| G(u) - $(u) - xW | . 

It is obvious that the function 4>(w) + xM has all the properties of the 
function Fj(u) in Lemma 8, having a bounded derivative 14>'(w) + x'M | < Qk . 
Hence, on taking z in (133) to be the number a in (51), the left-hand side of (133) 
does not exceed 

Q k n (3 £ 1 ~^° 8 - M du - v), « = Q*l.u.b. I G(u) - *(«) - x («) |. 
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Hence 


(1S5) n (3 1" -*)<«,!■(.+ n -nk,) + a/. 

In order to appraise I we recall (35) under Lemma 3 (replacing therein each 
Pki by the larger number ft, and merging the latter into Qk) 

|/(<x, k) - <p(t x, fe){1 + *(«,, tfe)} I < {S(|<x|*+ ••• 

(136) n 




+ KDI* 

for 

(137) \U\< Q k Vn. 

Put T = ( Qk\/n)\ with Q k here coinciding with that in (137) and then (136) 
is valid for | t\ | < T 111 and | U | < T vl . Write 

I-T // +T // +T // 

By Lemma 9 (i), 




whence, by (136) 


w*+-+u.r*>) 


(i*+«;)/* Q*r 

e “*»* - n j<*-xy/» ■ 


By Lemma 9 (iii) we have 


/ 2 <Q*r // | <i 1 |»(^x/ 2 * 

+ {|/(fx j fe) | + <p(.k i ts) | 1 + iK*<i > ik) |} dtidi* 


Obviously, 

(140) 


l.u.b. V (fi , h) 11 + , ik )| = <T nQk . 


On the assumption of non-singularity of P(x) we have, by Lemma 7, 
l.u.b. |/(<i, fe)| = l.u.b. I p ( — j =-, - 7 =) I 
( I4 D / U \l» 
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Hence 


h < Q„Te- 


/ / | fe|* (vi m + dtldti 

n /n M . n (,/J)(M, \ 
Q* y e i/2Jb + y ( 


For 1 9 we have | h | > T itl = Q*\/n, and so Lemma 7 is applicable to h in the 
same manner as to J 2 . Using Lemma 9 (i) on the factor | g(h , ti) | we get 


Combining (135), (138), (139), (142), (143) we obtain 


■(•p- 


^ < < 2 * (i 


„t/» _i« 

n . n 

71 € ■" n i(*- 2 ) ' n iUc-l) € SI 2 k 


/ J-l ^3/2(2—!) Z \ 

I n [Lo.?_ I W > 

' V * 1 / 2 * ' fZ/iie ' , 3 / 2 * j 


Putting c = ^kC k- viiu + s ) we get, as the last term in (144) is < Q k , 

" ( 3 r *--)<«. + «»»"’ (jprams + jii). 

If 4 < A; < 6, we take l = k — 2 and get 


'(»r- 


< Q* + Q/ 


(_L 

y n ( 0 -A)/( 2 ( 


+ l) < Q* 


Hence, by the argument following (70), 

l.u.b. 1<?(«) - Hu) - x(u) I < f - . 

giving (15). If fc > 7, we take l = and get 




Q* + Qk ( 1 + 


,(*-6)/(2(*+3)) 


)< Qk. 


Hence 


l.u.b. | G(u) - m - X(«) | <f= . 


giving (16). Therefore Theorem 4 is proved. 

5. When a 4 — 1 — at = 0. If a 4 — 1 — aj = 0, then there is unit probability 
that (i assumes exactly two values: 

Pr{fi = a) = p, Pr{fi « b) = g, p + g - 1. 
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Let = 1 with probability p and = 0 with probability q. Then {< =* b + 
(a — b)£i, i) = (o — b) 2 - S(f< — f) 2 . Hence it is sufficient to consider the 

ft 

variable - £ (£< — f)* = v- Letting Sf < = r = np + y/npq Xwe have vi “ 

ft 

r -= npq + (q — p)y/npq X — pqX 2 . We now consider two distinct cases: 

ft 

Case (i). p q. Here 

= Pr{(X + cVn) 2 > c 2 n - 2|c|Vn*}, c = 

Thus F(z) = 1 if z > J | c \ y/n. If * < % | c | y/n, then 

F(z) = Pr{X < -m - (c 2 n - 2 | c | Vn*)M 

+ Pt\X > -cVn + (c 2 n - 2 | c | vWl - Fi(a) + 

To the random variable X Theorem 2 can be applied. Suppose that c < 0; 
then, by Tchebycheff’s inequality, 


F 2 (z) < Pr{X > -eft} < 


c 2 ?! (p — g) 2 n* 


By Theorem 2, 

Fi(z) = Pr{X < —cn — (c 2 ft — 2|c|\/ft 2! ) i } 


Hence 


= Hz) + 


Gz 2 

Vn\p - q 1 


+ e(P 2 +_g 2 ) 

V npq 


<i» lw-*«l<4^ + vj|7 L r^+ n ^r S . / - 

The same inequality holds also for c > 0. 

Case iii). p = q = 1/2. Here in = J(n — X 2 ); hence 


(146) Jv{„ > V} - W £ »1 - 

There is no asymptotic expansion for the distribution function of »)i. (See 
(C), p. 83.) 



SAMPLING INSPECTION PLANS FOR CONTINUOUS PRODUCTION 
WHICH INSURE A PRESCRIBED LIMIT ON THE OUTGOING QUALITY 

A. Wald and J. Wolfowitz 
Columbia University 

1. Introduction. This paper discusses several plans for sampling inspection of 
manufactured articles which are produced by a continuous production process, 
the plans being designed to insure that the long-run proportion of defectives 
shall not exceed a prescribed limit. The plans are applicable to articles which 
can be classified as “defective” or “non-defective” and which are submitted for 
inspection either continuously or in lots. In Section 2 the notions of “average 
outgoing quality limit” and “local stability” are discussed. The valuable con¬ 
cept of average outgoing quality limit for lot inspection is due to Dodge and 
Romig [4], and that for inspection of continuous production to Dodge [1]. Sec¬ 
tion 3 contains a description of a simple inspection plan (SPA) applicable to 
to continuous production and a proof that the plan will insure a prescribed 
average outgoing quality limit. Section 4 contains a proof that this inspection 
plan also has the important property that it requires minimum inspection when 
the production process is in statistical control. In Section 5 is contained the 
description of a general class of plans which possess both these important proper¬ 
ties. 

The problem of adapting SPA to the case when the articles are submitted for 
inspection in lots instead of continuously, is treated in Section 6. Some methods 
of achieving local stability are discussed in Section 7 and a specific plan is devel¬ 
oped there. Finally Section 8 discusses the relationship between the present 
work and that of the earlier and very interesting paper of H. F. Dodge [1], 
mentioned above. 

If a quick first reading is desired the reader may omit the second half of Section 
3 (which contains a proof of the fact that SPA guarantees the prescribed average 
outgoing quality limit) and the entire Section 4 except for its title (the proof of 
the statement made in the title of Section 4 occupies the whole section). 

2. Fundamental notions. In this paper we shall deal only with a product 
whose units can be classified as “defective” or “non-defective.” We shall 
assume that the units of the product are submitted for inspection continuously, 
except in Section 6, where we assume that they are submitted in lots. Through¬ 
out the paper we shall assume that the inspection process is non-destructive, 
that it invariably classifies correctly the units examined, and that defective units, 
when found, are replaced by non-defectives. By the “quality” of a sequence of 
units is meant the proportion of defectives in the sequence as produced. By the 
“outgoing quality” (OQ) of a sequence is meant the proportion of defectives 
after whatever inspection scheme which is in use has been applied. If this 
scheme involves random sampling, then in general the OQ is a chance variable. 

30 
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(It depends on the variations of random sampling.) If the OQ converges to 
a constant p a with probability one as the number of units produced increases 
indefinitely, p a is called the *‘average outgoing quality” (AOQ). The AOQ 
when it exists is therefore the average quality, in the long run, of the production 
process after inspection. It is a function of both the production process and the 
inspection scheme. These definitions are due to Dodge [ 1 ]. 

The “average outgoing quality limit” (AOQL) is a number which is to depend 
only on the inspection scheme and not at all on the production process. Roughly 
speaking, it is a number, characteristic of an inspection scheme, such that no 
matter what the variations or eccentricities of the production process, the AOQ 
never exceeds it. For the purposes of this paper we shall need the following 
precise definition: Let c< be zero or one according as the ith unit of the product, 
before application of the inspection scheme, is a non-defective or a defective, 
respectively. Let di have a similar definition after application of the inspection 
scheme. (We note that if the tth item was inspected, then di = 0; if the tth 
item was not inspected, then c< = di .) The sequence c * C \, Cj , • • • , Cs , • • • , 
ad inf. characterizes the production process 1 . The elements of d = di , d %, • • • , 
ad inf. are in general chance variables. The number L is called the AOQL if it 
is the smallest 2 number with the property that the probability is zero that 

N 

Ed* 

lim sup *~‘ > L, 

N N 

no matter what the sequence c. 

It should be noted that this definition of AOQL places no restrictions whatever 
on the production process, since all sequences c are admitted. It is too much 
to expect a production process to remain always in control; indeed, doubt as to 
whether statistical control always exists may cause a manufacturer to institute 
an inspection scheme. The inspection schemes which we shall give below will 
yield a specified AOQL no matter what the variations in production are. If 
these schemes are employed, then, even if Maxwell’s demon of gas theory fame 
were to transfer his activities to the production process, he would be unsuccessful 
in an effort to cause the AOQL to be exceeded. A dishonest manufacturer might 
sometimes essay to do this. If we imposed restrictions on the sequence c and 

1 This use of an infinite sequence to describe the production process deserves a few words. 
What we consider in this paper are schemes^applicable when the number of units produced 
is large and operate mathematically as if the production sequence were of infinite length. 
Naturally the latter is never the case in actuality. However, the larger the number of 
units produced the more nearly will the reality conform to the results derived from the 
mathematical model. While the present definition uses explicitly the notion of an infinite 
sequence, such a commonplace statement as “the probability is 1/2 that a coin will fall 
heads up” uses this notion implicitly. It is also implicit in the intuitive meaning we ascribe 
to such a word as “average,” which is in every day use. 

1 It is not difficult to see that such a number always exists, for it is the lower bound of a 
set which is non-empty (it contains the point one), bounded from below (zero is a lower 
bound), and closed. 
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determined the AOQL on that basis, we would run the danger that the relative 
frequency of defects in the sequence of outgoing units might exceed the AOQL if 
it happened that the actual sequence c did not satisfy the restrictions imposed. 

After we discuss below various possible sampling inspection plans which 
insure that the AOQL does not exceed a predetermined value L, it will be seen 
that for any given L > 0 there are many sampling inspection schemes which do 
this. To choose a particular sampling plan from among them the following 
considerations may be advanced: If two inspection plans S and S' both insure 
the inequality AOQL < L and if for any sequence c the average number of 
inspections required by S is not greater than that required by S' and if for some 
sequences c the average number of inspections required by S is actually smaller 
than that required by S', then S may be considered, in general, a better inspec¬ 
tion plan than S'. However, the amount of inspection required by a sampling 
plan is not always the only criterion for the selection of a proper sampling 
scheme. There may be also other features / a sampling plan which make it 
more or less desirable. We shall mention here one such feature, called “local 
stability,” which will play a role in our discussions later. Consider the sequence 
d obtained from the sequence c by applying a sampling inspection scheme. Even 
if the AOQL does not exceed L, it may still happen that there will be many large 
segments of the sequence d within which the relative frequency of ones is con¬ 
siderably higher than L. For instance, it may happen that in the segment 
(di, • • • , d m ) the relative frequency of ones is equal to |L, in the segment 
(d m+ i, • • • , di m ) the relative frequency is equal to £L, in the segment (djm+i, 
• • • , di m ) the relative frequency is again equal to $L, and this is followed again 
by a segment of m elements where the relative frequency of ones is equal to £L, 
and so forth. If m is large, such a sequence d is not very desirable, since each 
second segment will contain too many defects. A sequence d is said to be not 
locally stable if there exists a large fixed integer m such that the relative frequency 
of ones in (dk+i , • • • , dk+m) is considerably greater than L for many integral 
values k. On the other hand, the sequence d is said to be locally stable if for 
any large m the relative frequency of ones in (dk+i , • • • , dk+m) is not substan¬ 
tially above L for nearly all integral values k. This is clearly not a precise 
definition of “local stability,” but merely an intuitive indication of what we want 
to understand by the term, since we did not define what we mean by “large m,” 
“many values of k” “considerably above L,” etc. A precise definition of local 
stability will not be needed in this paper,•since it is not our intention to develop 
a complete theory for the choice of the sampling plan. The idea of local stability 
will be used in this paper merely for making it plausible that some schemes we 
shall consider behave reasonably in this respect. A similar idea, called “protec¬ 
tion against spotty quality,” is discussed by Dodge [1]. A possible precise defini¬ 
tion of local stability could be given in terms of the frequency with which F(N) = 


1 

(k + 1) 


N+k 

5 


di (k being fixed) lies within given limits. 
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3. A sampling inspection plan which insures a given AOQL no matter what 
tiie variations in the production process. The only feature of the sampling 
(inspection) plan (SP) studied in this section and hereafter referred to as SPA 
which we shall consider here is that it insures the achievement of a specified 
AOQL. Considerations leading to a choice among several schemes are postponed 
to later sections. 

For convenience, let / be the reciprocal of a positive integer. SPA calls for 
alternating partial inspection and complete inspection. Partial inspection 
is performed by inspecting one element chosen at random from each of successive 


groups of j elements. Complete inspection means the inspection of every 

element in the order of production. SPA is completely defined when a rule 
is given for ending one kind of inspection and beginning the other. 

It is clear that all SP need not be of the above class. Thus, for example, a 
scheme might consist of partial inspection with various jfs employed in various 
sequences. We make no attempt in this paper to examine all possible schemes. 
For simplicity in practical operation, alternation of complete inspection and 
partial inspection with fixed / would seem reasonable. The Dodge scheme [1] 
is of this type. 

We shall also not discuss the question of a choice of the constant /, but will 
assume that a particular value has been chosen for various reasons and is a datum 
of our problem. Reasons which might influence a manufacturer in his choice 
of / could be contract specifications which impose a minimum on the amount of 
inspection, or psychological grounds to the same effect. The manufacturer 
may desire a certain minimum amount of inspection in order to detect mal¬ 
functioning of his production process. Also / controls local stability to some 
extent. The consequences of a choice of / as they appear in the theory below 
may also play a role. 

Returning to SPA, we begin with partial inspection. Let L be the specified 

AOQL. Denote by h N the number of groups of j units in which defectives 

were found as the result of partial inspection from the beginning of production 
through the Nth unit. SPA is as follows: 

(a) Begin with partial inspection. 

(b) Begin full inspection whenever 


€n = 



> L. 


(c) Resume partial inspection when 

e*<L. 

►(d) Repeat the procedure. (It will be recalled that defective units, when 
found, are always to be replaced with non-defectives.) 
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It is to be observed that in this plan the number of partial inspections increases 
without limit. For, while complete inspection is going on, the value of ky 
remains constant, so that after a long enough period of complete inspection the 
denominator N of the expression which defines ey will have increased sufficiently 
for ey to be not greater than L. On the other hand, complete inspection may 
never occur. This will be the case if, for example, no defectives or very few 
defectives are produced. 

We shall now show that the AOQL of the above SP is L. We first note that, 


at N , ey can increase only by 


(H 


Hence, for sufficiently large N, ey < 


L + «, where e > 0 may be arbitrarily small. 

Suppose now that the production process is subject to any variations whatso¬ 
ever, i.e., the sequence 

c C\ , C 2 , * * * , Cy , * * * , ad inf. 

is any arbitrary sequence whatever (by their definition the c» are all zero or one). 
Our result is therefore proved if we show that, with probability one, 


lim (e„ - ^ = 0 

jv-qo \ JN x-1 / 


for this arbitrary c, and that for at least one c 
(3.2) lim e N = L. 


Let S(N) be the number of groups of j units which have been partially in¬ 
spected through the iVth unit. Define Xi as zero if in the zth partially inspected 
group a non-defective was found and as one if a defective was found. We have 

8(y ) 

ky = X) x i- 

*-1 

Since the number of times partial inspection takes place increases indefinitely, 
S(N) —► 00 as N —> 00 . Also S(N) < fN < N. Let a,- be the serial number 
of the last unit in the jth partially inspected group. Then for all j the expected 
value E(x 3 ) of Xj is given by 


E{x,) =/( t, c). 
V-(« /-(!//)+!) / 


We have, for all j 


“ j 

»-( 


(fit di) ~ X j 


so that 
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Also from (3.3) it follows, since Xj is the value of a binomial chance variable 
from a population of fixed number 
such that 


(?) 


jj , that there exists a positive constant 0 


(3.4) 



where <r 2 (x) is the variance of a chance variable x. Now a theorem of Kolmo- 
goroff (Kolmogoroff [2], FrSchet [3], p. 254) states: 

A sequence of chance variables with zero means and variances <t\ , <r \, • • • 
converges with probability one towards zero in the sense of Cfcsaro if 


(3.5) 



converges. The inequality (3.4) permits us to apply this theorem to the se¬ 
quence of chance variables of which the jth (J = 1, 2, • • • ad inf.) is 



with probability one, 



since the units which are fully inspected contribute nothing to 2d*. Since 
S(N) < N y the desired result (3.1) is a fortiori true. 

If c is such that all the c, are one, it is readily seen that (3.2) holds. If many 
(this adjective can be precisely defined) defectives are produced, this will also 
be the case. This completes the proof of the fact that the AOQL of SPA is L 
no matter how capriciously the production process may vary. 


4. When the production process is in statistical control, SPA requires minimum 
inspection. The production process is said to be in statistical control if there 
is a positive constant p < 1 such that, for every i, the probability that c* = 1 
is p and is independent of the values taken by the other c’s. We shall see that 
if the process is in statistical control and if SPA is applied to it, the specified 
AOQL is guaranteed with a minimum amount of inspection. 

The number of units inspected through the Nth unit produced is 

(4.1) I(N) = N-(j- l)S(N). 

If the process is in statistical control we have, with probability one, 
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(4.2) 


lim 



V 


by the strong law of large numbers. Shortly we shall prove the existence of a 
constant L* such that, with probability one, 


(4.3) 


lim 

N—+co 


N 

t-1 

N 


L*. 


Assume for the moment that this is so. Since it is only by inspection that de¬ 
fectives are removed, and the units selected for inspection are in statistical con¬ 
trol like the original sequence, it follows that, with probability one, 


(4.4) 


lim = \ (p - L*) = 1 

N —00 N p 


L* 

V 


because, with probability one, 

N 


lim 

N-*oo 


£ 


t-1 


(Ci - di) 


= p - L*. 


Inspection is therefore at a minimum when L* is at a maximum compatible 
with the specified AOQL. By (4.3) the latter means that 

(4.5) L* < L. 


SPA has been shown to guarantee this requirement. The optimum situation 
from the point of view of the amount of inspection would therefore be to have 
L* = L, but this cannot always be achieved. The absolute minimum amount 
of inspection clearly is /, i.e., partial inspection exclusively. Consequently 
from (4.4) 


so that 


1 


L* 

V 


>f 


(4.6) L* < p( 1 - /). 

Combining (4.5) and (4.6) we see that we have to consider three cases: 
Case a. If 

(4 7) p> T~Tf 

we have to show that 


(4.8) 

Case b. If 

(4.9) 


L = L*. 


V < 


L 


1 -/ 
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we have to show, by (4.4), that 



that is, 

(4.10) L* = p(l — /). 

Case c. If 

(4.11) p = 
we have to show that 

(4.12) L - L* « p(l - /). 

Proof of (4.8): We have already remarked in Section 3 that in SPA partial 
inspection always recurs, but complete inspection need never occur. We shall 
show in a moment that (4.7) implies that no matter how large an integer y 
is chosen, the probability of temporarily stopping partial inspection for some 
N > 7 is one. Assume that this is so. Choose an arbitrarily small positive 

6-0 

€, and let y > —-A. For a sequence where complete and partial inspection 

alternate infinitely many times let 

A = ai , « 2 , * • • , ad inf. 

be the sequence of integers at which partial inspection ends, and let 

B = ft , &, • • • , ad inf. 

be the sequence of integers at which complete inspection ends. Then, for all j, 

«/+1 > Pi > a/ • 

From the description of SPA it follows that, for all A r > y which belong to either 
A or By 

(4.13) I Cat — L I < €. 

In Section 3 we proved 

(3.1) lim (e N - i ) = 0 

A-oo \ A t-1 / 

with probability one. Since e is arbitrarily small it follows that, with probability 
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To complete the proof of (4.8) we have still to show that L* exists and that the 
probability is one that complete inspection will occur infinitely many times. 
First we prove that L* exists. 

s 

As N increases during an interval of complete inspection, D(N) = 23 d* 

t-i 


remains constant. 


Hence 


D(N) 
' N 


decreases monotonically. 


Since for the ends 


of such intervals (4.14) holds, it follows that (4.14) holds as N —> °o and is a 
member of A> B y or an interval (a,-, 0,) for all j. 

Let N —► oo while always being in the interior of an interval (0/, ay+J, j = 
1 ,2, • • • , ad inf., which contains a J+ i but not 0/. Let N* be the total number 
of units in these intervals through the Nth unit produced. Let Ni and N 2 be 
such that 


Then 


Pi = Ni < N 2 < 0L)+1 . 


N* - Ni = Ni — Ni. 


Since the production process is in statistical control, we have, by the strong 
law of large numbers, 


(4.15) 


lim 


D(N) 

N* 


P(1 ~f) = p' 


with probability one. Let 5* be the general designation for numbers <« in 
absolute value, so that all S* are not the same. With probability one for almost 
all N, we have by (4.15) 


pm 

Nt 


p' + a* 


pm 

Nt 


= p'+ 5 *. 


Write 


Now 


f Dm - Dm] _ v 

(Nt - NO 


Dm 

~nT 


pm + [pm - pmi pm + 1 Dm - omi 

Ni + (Ni - Ni ) “ Ni + (Nt - NO 


(Pi + d*)Nj ± K(Nt - no 
N i + (Nt - NO 


= p' + 


Hence 

(4.16) 


K(Nt - NO = 2 S*Ni +(?’ + 5 *)(N t - NO. 
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Now suppose (4.3) does not hold. From the definition of AOQL it follows that 
for some ij > € there exist sequences (whose totality has a positive probability) 
so that, for infinitely many N 2 we have 

( 4 .m - gw + P w-m)] <t _ 4 , 

Nt Ni + (Nt - Ni) 

For large enough A r i, from (4.14), 


*P-L + .* 

Nl 


with probability one and hence, using (4.16) in (4.17) 

(4 18) ^ + ^ + 2i * N ' + iP ' + S * )iN * ~ Nl) ' 

" " < LNi + L(N 2 - N{) - 4 v N 2 

from which, using the fact that p' > L (from (4.7)), we get 

(4.19) N t 5* + 2 NU* + 6*(Ni - Nt) < -4ij N 2 . 

((4.18) and (4.19) hold for the sequences for which (4.17) holds, except perhaps 
on a set of sequences whose probability is zero.) Since Nt < Ni and | 5* | < rj, 
we have, on the other hand, 


(4.20) 


N t S* + 2 N*S* + &*(N, - Nt) > - 3 vNt - v(N 2 - Nt) 

> — 4:rjNi — 417(^2 — Ni) = — 4 i 72 V 2 


which contradicts (4.19) and proves the desired result ((4.3) and (4.8)), except 
that it remains to prove that, no matter how large 7, the probability of tempo¬ 
rarily stopping partial inspection at some N > 7 is one. Let 70 > 7 be some 
integer at which partial inspection is going on. From (4.2) and (4.7) it would 
follow, if partial inspection never ceased on a set of sequences with positive 
probability, that, on this set, with conditional probability one, for N sufficiently 
large and e sufficiently small, 


Un ^ 

f(N - 70) ~f 


+ 


—- - — ( U /} > L + (1 - /)«, 


N — yt> fN 


N 


N 


e„ > L+ 

This contradiction proves that complete inspection is eventually resumed and 
completes the proof of minimum inspection in Case a. 
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Proof of (4.10): We shall prove that (4.9) implies that, with probability one, 
complete inspection will cease, never to be resumed. For, from (4.15) and 
(4.9) it follows that for N sufficiently large and « sufficiently small, 


(4.21) 


D(N) 

N* 


p' + 6* < L - 2c. 


Hence, a fortiori, 
(4.22) 


D(N) 

N 


< L - 2«. 


((4.21) and (4.22) hold with probability one.) 

(3.1) states that, with probability one, 

Hence for all N sufficiently large, with probability one, 


e N < L — e, 


i.e., with probability one complete inspection is never resumed. 

When (4.9) holds, therefore, with probability one and with a finite number of 
exceptions SPA will require only partial inspection. 

L 

Proof of (4.12): If p = ~ ~f anc * com P^ ete inspection finally never resumes, 


then (4.12) follows easily. If p = ^ ~ and partial and complete inspection 

alternate infinitely many times, then the proof is similar to that of (4.8) and is 
therefore omitted. In either case the desired result follows. 


6. A class of SP all of which insure both a given AOQL and minimum inspec¬ 
tion, Let the definition of SPA be modified in the following particulars: 

(b) Begin full inspection whenever 

**(?" 0 

e* = ' J N ' > L + <p(N). 

(c) Resume partial inspection when 

es < L - *(iV). 

Let <t>(N) and ^(i V) be such that 

-+{N) < <KN) 
lim <t>(N) — lim $(N) — 0. 

N —oe 

(SPA corresponds to the case 4>(N) m \//(N) m 0.) Then all the SP of this class 
have the property that the AOQL is L and that inspection is at a minimum in 
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the sense of Section 4. The proofs are essentially the same as those for SPA 
and hence will be omitted. 


6. The inspection plans of Section 5 can also be applied to lot inspection. 

We shall carry on the discussion of this section in terms of SPA, but the results 
apply to all the members of the class of plans described in Section* 5. We shall 
show that SPA can also be applied when the product is submitted for inspection 
in lots. Although we assumed previously that the units of the product are 
arranged in order of production, the results obtained for SPA remain valid for 
any arbitrary arrangement of the units. If the product is submitted in lots we 
may arrange the units as follows: Let h , k , ••• , etc. be the successive lots in 
the order of their submission for inspection. Within each lot we consider the 
units arranged in the order in which they are chosen for inspection. In this way 
we have arranged all units in an ordered sequence and the inspection can be 
applied as described before. Thus, we start with partial inspection, i.e., we 

take out groups of - elements in h and inspect one unit (selected at random) 

from each of these groups. When e# > L, we start complete inspection and 
revert to partial inspection as soon as e N < L. When the units in h are used 
up in the process of inspection, we continue, using the units of U , etc. 

If it is found inconvenient to take out a group of j units and then to select 

one unit for inspection, we could modify the sampling inspection plan as follows: 

Instead of taking out a group of j units and then selecting at random one unit 

from it, we select at random one unit from the uninspected part of the lot and 
look upon this unit as the unit selected at random from a hypothetical group of 

j units. Thus we can proceed exactly as before, except that we have to keep in 

mind that with each unit inspected under “partial inspection” we have used 

up another set of j — 1 units. Thus, as soon as — 1^ times the number of 

units inspected under “partial inspection” becomes equal to or greater than the 
number of units in the uninspected part of the lot, the inspection of that lot is 
already terminated, and we have to start using the units of the next lot. The 
inconvenience caused by the necessity of keeping track of the number of units 
inspected under “partial inspection” and of the number of units in the unin¬ 
spected part of the lot can be eliminated by further modifying the inspection 
plan as follows: Instead of beginning complete inspection as soon as e* > L, 
we continue “partial inspection” until E N = e N — L is so large that complete 
inspection of all the units of the lot not yet used up has to be made in order to 
bring e N down to L at the end of the lot. This leads to the following sampling 
procedure, to be known as SPB: Let N 0 be the number of units in the lot, let 
N l be the serial number of the last unit in the preceding lot, and let E{N L ) = 
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NlE Nl = N L (e NL — L) be the “excess” carried over from the preceding lot. 
For simplicity assume that the following are all integers: 

LN 0 - M 
= M* 

1 - / 
fN 0 = N* 


and 


fE(N L ) 
1 -/ 


E*. 


The inspection procedure is then as follows: Inspect successive units drawn at 
random until either 

(a) M* — E* defectives have been found in the first N' < N* units inspected. 

N' 

In this case inspect further an additional N 0 — — units and this terminates the 

inspection of the lot. The excess to be carried over to the next lot is then zero. 
Or 

(b) N* units have been inspected and the number of defectives found is H < 
M* — E *. In this case the inspection of the lot is terminated and the present 
negative excess 


E(N l + No) = [ff - ( M* - E*)} 

is carried over to the next lot. (The serial number of the last element in the 
present lot is N L + N 0 and 


e WL+#Q) — 


N L e NL + H 

nTTNo 


Hence the present excess is 


(N l + i\r»)[e (Wl+Wo) - L] = N L e KL + H (1 ^ } - LN l - LN 0 
= N L (e KL - L) + H - M ' 

- [H - M* + E*], 


as given above.) 

We note an important property of SPB: The excess carried over from a pre¬ 
ceding lot is never positive. 
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7. Possible modifications of the SP to achieve local stability. Although 

the sampling plans discussed in previous sections are optimum in the sense that 
they guarantee the desired AOQL with a minimum of inspection when the 
production process is in statistical control, they do not always behave very 
favorably as far as local stability is concerned. To make this point clear, 
consider the following example: Suppose that during a very long initial time 
period the production process functions very well and the relative frequency 
of defectives produced is well below L. Thus, applying SPA, say, e N — L will 
be considerably less than zero at the end of this period. Now suppose that then 
the production process suddenly deteriorates and the number of defectives 
produced during the next period of time is considerably higher than L. In spite 
of that, complete inspection will not begin for quite some time because e N became 
so small during the initial period. Thus there will be a long segment in the se¬ 
quence of outgoing units within which the relative frequency of defectives will 
be larger than the prescribed AOQL. Of course, this segment will be counter¬ 
balanced by other segments where the relative frequency of defectives will be 
below the AOQL, so that the AOQL will not be violated. Nevertheless, the 
occurrence of long segments with too many defectives, i.e., a lack of local sta¬ 
bility, is not desirable. 

It should be noted that, even though SPA was not designed to achieve con¬ 
siderable local stability, drastic lack of local stability cannot occur when the 
production process is in statistical control and SPA is employed. In the example 
given above where the outgoing quality was not locally stable, it was assumed 
that there were variations in the production process. The existence of statistical 
control acts as an important stabilizing factor on the quality. 

In this section we want to discuss several possible modifications of SPA which 
will insure a greater degree of local stability. One such modification is the 
following: We choose a positive constant A and we define the excess E* for each 
value N as follows: E*(N) is equal to the excess E(N) as originally defined 
(= N[eir — L]) as long as for all N ' < A, E(N') > —A. The dif¬ 
ference E*(N + 1) - E*(N) = E(N + 1) - E(N) for all N for 
which E(N + 1) - E{N) > 0. If E(N + 1) - E(N) < 0, then E*(N + 1) - 
max[#*(iV) + {E(N + 1) — E(N )), —A]. In other words, with this modificar 
tion of the sampling inspection plan we set a lower bound —A for the excess. 
When the excess is positive we begin complete inspection, and revert to partial 
inspection when the excess becomes non-positive. The effect of this is that, if 
the proportion of defectives produced becomes large, complete inspection will 
not be delayed very long, although the proportion of defectives produced in the 
preceding period may have been considerably below L. It is clear that this 
modification of SPA does not increase the AOQL. However, the amount of 
inspection will be somewhat increased, especially when the quality of the product 
is less than or only slightly greater than L. If the constant A is large, the in¬ 
crease in the amount of inspection is only slight, but also the degree of local 
4stability achieved is not very high. On the other hand, if A is small, the increase 
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in the amount of inspection may be considerable, but a high degree of local 
stability is achieved. Thus, the choice of A should be made so that a proper 
balance between local stability and amount of inspection is achieved. 

Modifying SPA by setting a lower limit for the excess has the disadvantage 
that the mathematical treatment of this case is involved. We shall, therefore, 
consider another modification of the inspection plan which will have largely the 
same effect, but whose mathematical treatment appears to be much simpler. A 
fixed positive integer N 0 is chosen and the inspection scheme is designed so that 
E nq < 0 is assured. If E Nq is negative, we replace it by zero. In other words, 
no excess is carried over from the first segment of No units to the next segment of 
No units. Thus, the second segment of No units is treated exactly the same way 
as if it were the first segment, and this is repeated for each consecutive segment 
of Nq units. This modification of SPA (the resulting plan is to be known as 
SPC) has essentially the same effect as setting a lower bound for the excess. 
Again it is clear that by this modification the AOQL is not increased, but the 
amount of inspection may be increased. The latter is particularly true when 
No is small, which corresponds to very high local stability requirements. More 
efficient plans than SPC can probably be devised for this situation. 

Undoubtedly, there are many other possible modifications of the inspection 
plan by which a greater degree of local stability can be achieved at the price of 
somewhat increased inspection. It is not the purpose of this paper to enumerate 
all these possibilities or to develop a theory as to which of them may be con¬ 
sidered an optimum procedure. We shall restrict ourselves to a discussion of the 
mathematical consequences of SPC. First we define it precisely. If it is to be 
applied to inspection of lots of size No then SPC is simply SPB with E(N L ) 
and E* always zero. When applied to continuous production it will operate 

fM 

as follows: Assume for convenience that M = LN 0 , A* = fNo , and p—y = 
are all integers. 

(a) Begin each segment of No units with partial inspection, i.e., inspect one 


Continue partial 


unit chosen at random from each successive group of ^ units. 

inspection until one of the following events occurs: either 

(b) M * defectives are found. In this case begin complete inspection with the 
first unit which follows the group in which the last of the M* defectives was 
found and continue until the end of the segment of JV 0 units. 

or 

(b') N* groups of j units are partially inspected. 

(c) Repeat with the next segment of No units. 

Comparison with SPB shows that, in SPC, if (b) occurs earlier or at the same 
time as (b'), then E Nq = 0, while if (b') occurs before (b) we have E # 0 < 0. 
In contradistinction to SPB, in SPC there is no carrying over of the excess. 
Let us determine the AOQ for SPC when the production process is in a state 
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of statistical control. Denote by p the probability that a unit produced will be 
defective. Let the chance variable H denote the number of defectives found 
during partial inspection. The probability that H = i < M* is 

(?>■ - 

H < M* always. We have, when H = i f 

E(N„) = - LNo, 

and hence 




(1 ~f)i 

7 


The AOQ is therefore ^ 
JN o 

therefore 


multiplied by the expected value of H and is 



The reduction from the original quality p to the AOQ was achieved by inspecting 

a fraction of units which is - times the reduction in the frequency of defectives. 

P 

Hence, with probability one, the fraction of units inspected when the production 
process is in statistical control is 

1 - 1 - f + ir S' - f) (f) * ,,(I - pr " 


When p > 


. we see from Section 4 that the third term of the right member 


1 -/ 

of (7.2) represents the price paid in fraction of inspection above the minimum in 


return for the local stability achieved. 


When p < 


1 


L 

-/ 


the additional inspec¬ 


tion is of course / — /. 

As No becomes larger, SPC becomes more and more like SPA, and conse¬ 
quently the amount of inspection tends to the minimum. As No becomes 
smaller, the degree of local stability achieved becomes higher and must be 
paid for by an increasing amount of inspection. An illustrative example will be 
given in the next section. It has already been pointed out that the mere exist¬ 
ence of statistical control implies a considerable amount of local stability even 
when SPA is applied. 
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The only practical difficulty which may arise in evaluating the formulas in 
(7.1) and (7.2) might come from attempting to evaluate 

r = g l (m* - i) p \i - p) N ’-\ 


For those values of the parameters which are likely to occur in application, a 
good approximation to V (exactly how good we shall not investigate here) is 
given by 


T = t (M* - i) - 


- N ' p (N*pY 
il 


A table of T for integral values of M* from 2 to 16 and for integral values of N*p 
from 1 to 25 is given below. The computations were performed under the 
direction of Mr. Mortimer Spiegelman of the Metropolitan Life Insurance 
Company, to whom the authors are deeply obliged. 


Table of T 


M *—1 


£ (it/* 

t—0 


i) 


e-v*p(N*p) i 

i\ 


N*p 



l 

2 

3 

4 

5 

6 

7 

8 

9 

10 

il 

12 

1 

1.10 

.54 

.25 

.11 

.05 

.02 

.01 

.00 

.00 

.00 

.00 

.00 

2 

2.02 

1.22 

.67 

.35 

.17 

.08 

.04 

.02 

.01 

.00 

.00 

.00 

3 

3 v 00 

2.08 

1.32 

.78 

.44 

.23 

.12 

.06 

.03 

.01 

.01 

.00 

4 

4.00 

3.02 

2.13 

1.41 

.88 

.52 

.29 

.16 

.08 

.04 

.02 

.01 

5 

5.00 

4.01 

3.05 

2.20 

1.49 

.96 

.59 

.35 

.20 

.11 

.06 

.03 

6 

6.00 

5.00 

4.02 

3.08 

2.26 

1.57 

1.04 

.66 

.41 

.24 

.14 

.08 

7 

7.00 

6.00 

5.01 

4.03 

3.12 

2.31 

1.64 

1.12 

.73 

.46 

.28 

.17 

8 

8.00 

7.00 

6.00 

5.01 

4.05 

3.16 

2.37 

1.71 

1.19 

.79 

.51 

.32 

9 

9.00 

8.00 

7.00 

6.00 

5.02 

4.08 

3.20 | 

2.43 

1.77 

1.25 

.85 

.56 

10 

10.00 

9.00 

8.00 

7.00 

6.01 

5.03 

4.10 

3.24 

2.48 

1.83 

1.31 

.91 

11 

11.00 

10.00 

9.00 

8.00 

7.00 

6.01 

5.05 

4.13 

3.28 

2.53 

1.89 

1.37 

12 

12.00 

11.00 

10.00 

9.00 

8.00 

7.01 

6.02 j 

5.07 

4.16 

3.32 

2.58 

1.95 

13 

13.00 

12.00 

11.00 

10.00 

9.00 

8.00 

7.01 

6.03 

5.08 

4.19 

1 3.36 

2.63 

14 

14.00 

13.00 

12.00 

11.00 

10.00 

9.00 

8.00 

7.01 

6.04 

5.10 

4.22 

3.40 

15 

15.00 

14.00 

i 

13.00 

12.00 

11.00 

10.00 

9.00 | 

8.01 

7.02 

6.05 

5.12 

4.25 


8. The SP of H. F. Dodge. H. F. Dodge [1] has proposed a very interesting 
SP for continuous production. The plan is defined by two constants i and / 
and may be described as follows: Begin with complete inspection of the units 
consecutively as produced and continue such inspection until i units in succes¬ 
sion are found non-defective. Thereafter inspect a fraction / of the units. 
Continue partial inspection until a defect is found. Then start complete inspec¬ 
tion again and continue until i units in succession are found non-defective. 
Repeat the procedure. 

Dodge [1] derived formulas for determining the AOQL corresponding to any 
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pair i and f f under the assumption that the production process is in a state of 
statistical control. Dodge’s formulas for the AOQL are not necessarily valid 
if we do not make this restriction on the production process, i.e., if we admit 
that the probability p that a unit will be defective may vary in any arbitrary 
way during the production process. This, of course, is not a criticism of the 
derivation of the formulas; it cannot be considered surprising that a formula is 
not valid under assumptions different from those under which it was derived. 
However, it is relevant to point out the fact that the Dodge SP does not guaran¬ 
tee the AOQL under all circumstances, so that care must be taken to ensure that 
certain requirements are met. Exactly what these requirements are is not 
known; statistical control is a sufficient condition, but is probably not necessary 
and could be weakened. It seems likely to the authors that, if p varies only 
slowly (with N) with infrequent ‘‘jumps,” the Dodge SP will produce results 
which will exceed the AOQL by little, if at all. But if the “jumps” are numer- 


Af •—1 


Table of T « (M* - 

*-o 


i) 


e-~N*p(N*p)* 

i\ 


{Continued) 



13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

74 

25 

1 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

2 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

3 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

4 

.01 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

5 

.02 

.01 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

6 

.04 

.02 

.01 

.01 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

7 

.10 

.05 

.03 

.02 

.01 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

8 

.20 

.12 

.07 

.04 

.02 

.01 

.01 

.00 

.00 

.00 

.00 

.00 

.00 

9 

.36 

.23 

.14 

.08 

.05 

.03 

.01 

.01 

.00 

.00 

.00 

.00 

.00 

10 

.61 

.40 

.26 

.16 

.10 

.06 

.03 

.02 

.01 

.01 

.00 

.00 

.00 

11 

.97 

.66 

.44 

.29 

.18 

.11 

.07 

.04 

.02 

.01 

.01 

.00 

.00 

12 

1.43 

1.02 

.71 

.48 

.32 

.20 

.13 

.08 

.05 

.03 

.02 

.01 

.01 

13 

2.00 

1.48 

1.07 1 

.75 

.52 

.35 

.23 

.15 

.09 

.06 

.03 

.02 

.01 

14 

2.68 

2.05 

1.54 

1.12 

.80 

.55 

.38 

.25 

.16 

.10 

.07 

.04 

.02 

15 

3.44 

2.72 

2.10 

1.59 

1.17 

.84 

.59 

.41 

.27 

.18 

.12 

.07 

.05 


ous and appropriately spaced it is possible to exceed the AOQL by substantial 
amounts, as the example below will show . The Dodge plan was intended to 
serve as an aid to the detection and correction of malfunctioning of the produc¬ 
tion process and this use w T ould tend to prevent the occurrence of such a phenome¬ 
non. Parenthetically, it should be remarked that the information obtained in 
the course of inspection according to either the plans discussed in this paper or 
any reasonable scheme should, if possible, be sent at once to the producing 
divisions for their guidance. 

An example to show that the AOQL can be exceeded can be constructed as 
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follows: Let i = 54 and / = O.L Then according to the graphs of [1], page 
272, the AOQL should be 0.02. Define a sequence of 60 successive units free 
of defectives as a segment of type 1, and a sequence of 60 successive units where 
the production process is in statistical control with p = 0.1, as a segment of type 
2. Suppose that the sequence of units produced consists of segments of types 
1 and 2 always alternating. Then it follows that the first item inspected in a 
segment of type 2 is always inspected on a partial inspection basis. We now 
assume that, unless the occurrence of a defective has previously terminated 
partial inspection, the 1st, 11th, 21st, 31st, 41st, and 51st items in a segment 
of type 2 will be chosen for partial inspection, and if the 1st item is found defec¬ 
tive, the entire segment of type 2 will be cleared of defectives. (Both of these 
assumptions favor the Dodge SP.) Then the situation is as described in the 
following table: 


(l) 


( 2 ) 


(3) 



Probability of first 
terminating partial 
inspection at 
each item 

Expected number of defec¬ 
tives remaining in seg¬ 
ment of type 2 after 
partial inspection 
has been ter¬ 
minated 

(1) x (2) 

1st 

.1 

0 

0 

11th 

(.9) (.1) = .09 

.9 

.081 

21st 

(.9) 2 (.l) = .081 

1.8 

.1458 

31st 

(.9)’(.l) = .0729 

2.7 

.19683 

41st 

(.9) 4 (.l) = .06561 

3.6 

.236196 

51st 

(.9) B (.l) = .059049 

4.5 

.2657205 

Expected number of defectives 

Probability that an entire left in a segment of type 2 

segment of type 2 will which has been inspected 

be partially inspected only partially 

Product 

(.9)* = 

.531441 5.4 

2.8697814 


Sum = 3.7953279 

o 7 Q 5327 Q 

The AOQ is therefore * ■■■ ■■ — — = .0316+, while L = .02. 

i^U 


It is therefore difficult to compare the Dodge plan with any of the plans de¬ 
scribed in this paper with respect to their effect on a production process not in 
statistical control. If the production process is in statistical control, then, as we 
have already seen, SPA requires minimum inspection (and, incidentally, because 
of the existence of statistical control, produces a fair degree of local stability). 
If, when statistical control exists, one requires both maintenance of a given 
AOQL and a higher degree of local stability than is produced by SPA, the rele¬ 
vant comparison is between the Dodge plan and SPC. Both will probably give 
good results as regards local stability, but it is not possible at present to make 
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these intuitive notions precise, as we have not given an exact definition of local 
stability. The following example (in which statistical control is assumed) may 
not be unrepresentative of what the situation is with regard to the amount of 
inspection required. 


Fraction of product inspected under the Dodge plan and under SPC when 
L = .045 ' / = .1 


V 

Fraction of product 
inspected under the 
Dodge plan 

Fraction of product inspected under SPC when 

No - 400 

No - 1000 

No - 2000 

.01 

.12 

.12 

.10 

.10 

.02 

.15 

.17 

.11 

.10 

.03 

.19 

.22 

.14 

.11 

.04 

.23 

.28 

.19 

.15 

.05 

.28 

.34 

.26 

.21 

.06 

.33 

.40 

.33 

.29 

.07 

.39 

.45 

.39 

.37 

.08 

.45 

.50 

.46 

.44 

.09 

.52 

.54 

.51 

.50 

.10 

.58 

.57 

.55 

.55 


The decrease in inspection required by SPC as N 0 increases is evident in this 
table. When N 0 — 2000 SPC requires less inspection than the Dodge plan, 
when No = 400 it requires more inspection than the Dodge plan. How the 
various degrees of local stability achieved compare remains an open question. 
The case when N 0 = 400 probably lies in the region where SPC is inefficient 
(as regards amount of inspection) and corresponds to a high degree of local 
stability. 

We note that both plans call for increased inspection as the quality worsens 
(p increases). If the manufacturer is required to pay for the inspection this 
serves as an added incentive to improve quality of output. 
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THE EXPECTED VALUE AND VARIANCE OF THE RECIPROCAL AND 
OTHER NEGATIVE POWERS OF A POSITIVE BERNOULLIAN 

VARIATE 1 

By Frederick F. Stephan 
War Production Boards Washington 

1. Introduction. The expected value of the reciprocal of a Bernoullian 
variate appears in certain problems of random sampling wherein both practical 
considerations and mathematical necessity make zero an inadmissible value 
of the variate. This special condition excluding zero is necessary from a practical 
standpoint because statistics can not be calculated from an empty class. It is a 
necessary condition, in the mathematical sense, for the expected value, and 
variances involving it, to be finite. When subject to this condition the Bernoul¬ 
lian variate will be designated the positive Bernoullian variate. 

There appears to be no simple expression for the expected value of the recip¬ 
rocal such as there is for the expected value of positive integral powers of the 
positive Bernoullian variate. This paper presents in (15) a factorial series, 
which can be computed conveniently to any desired number of terms by means 
of the recursion relation (18). Upper and lower bounds on the remainder may 
be computed readily from (20), (21), (23), (24), and (26) and the approximation 
may be improved by adding an estimate of the remainder taken between these 
bounds. A factorial series for the expected value of negative integral powers 
is given in (34). A factorial series for the expected value of the reciprocal of the 
positive hypergeometric variate is given in (53). Series for the variances follow 
directly from the series for expected values. 

A simple example of the sampling problems in which this expected value 
appears is presented by the following instance of estimates derived from samples 
of variable size: 

An infinite population consists of items of two kinds or classes, A and B. 
Lots of N items each are drawn at random. In such lots the number of items, 
x\ that are of class A is an ordinary Bernoullian variate. Next, every lot 
composed entirely of items of class B is discarded. This excludes all lots for 
which x ' = 0. From each remaining lot the N — x' items of class B are set 
aside, leaving a sample composed entirely of items of class A. The number of 
such items, x , varies from sample to sample. It will be designated a positive 
Bernoullian variate since x — x f if x f > 0 and x does not exist if x' < 0. Finally, 
let there be associated with each item in class A a particular value of a variable, 
y } the variance of which in A is a 2 . Then if the mean value of y is computed for 
each sample, the error variance of such means is E(a 2 /x) = <r*E( l/x). 

Instances similar to that just described occur in the design of sampling surveys 
from which statistics are to be obtained separately for each of several classes 

1 Developed from a section of a paper presented to the Washington meeting of 
the Institute of Mathematical Statistics on June 18,1943. 
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of the population, i.e., each statistic is to be computed from some part of the 
sample instead of all of it. They also occur in certain sampling problems in 
which some of the items drawn for a sample turn out to be blanks. 

A related problem concerning the error variance of the proportion of males 
among infants born in any one year was considered by G. Bohlmann in a paper 
on approximations to the expected value and standard error of a function [1], 
His approach to the problem was to expand the function in a Taylor series and 
take the expected value of each term. The conditions under which the resulting 
series converges were developed for certain functions of a Bemoullian variate. 
The present paper provides a different and, in certain respects, superior approach 
to the problem employing a method due to Stirling [2]. While the method is 
applied to the reciprocal and negative powers it is also applicable to certain 
other functions of a Bernoullian variate. 


2. The positive Bernoullian variate. Let x be a random variate defined by a 
Bernoullian probability function subject to the special condition x > 0. The 
probability of x in n is 

(1) P{x) = (”)pV“7(l - «") 


where x and n are integers, 1 < x < n, and 


( 2 ) 


(n\ _ nl 
\xj x\(n — x)\' 


The probabilities p and q are constants, 0 < p = 1 — q < 1. 

The divisor 1 — q n arises from the condition excluding zero. (Bohlmann 
omits this factor, assuming that q n is negligible, an assumption that is not 
always valid. In fact, q n ~ e~ np .) An extension of this condition to exclude 
all values of x less than a specified constant will be considered in a later section. 

Throughout this paper summation is understood to be from x = ltoi*» 
unless it is shown otherwise. 


3. Expected values and moments. The expected values of x and its positive 
integral powers are 

(3) E{x) = np/{ 1 - q n ) 

(4) E(x 2 ) - (npq + n 2 p 2 )/( 1 - q n ) 
and, in general 

(5) E(x*) = x./(l -?") = £ Gtjl (”) i > 0 

where Vi is the ith moment about zero of an ordinary Bernoullian variate with 
the same n and p and the ©< are the Stirling numbers of the second kind (see 
Table 1). 

The moments about E(x) are somewhat more complicated than the corre- 
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sponding moments of the ordinary Bernoullian variate. For example, the ✓ 
variance 

( 6 ) 

and the third moment 

(7) E\(x - E(x ))*I = (g ~ P)"Pg _ 3 w8 P 8 g" +1 . AV (1 + O 

U; ) i _ 9 » (i_ 9 n)2+ (1 — • 

The moments about np y the first moment of an ordinary Bernoullian variate, 
are 

(8) E\(x - n V Y) - ( M i + (-llVfl/d - <?") 

TABLE 1 


Stirling numbers of the second kind , 



l 

2 

3 

i 

4 

5 

6 

1 

[ 1 

0 

0 

0 

0 

0 

2 

1 

1 

0 

0 

0 

0 

3 

1 

3 

1 

0 

0 

0 

4 

1 

7 

6 

1 

0 

0 

5 

1 

15 

25 

10 

1 

0 

6 

1 

31 

90 

65 

15 

1 

7 

1 

63 

301 

350 

140 

21 

8 

1 

127 

966 

1,709 

1,050 

266 

9 

1 

255 

3,025 

7,770 

6,951 

2,646 

10 1 

1 

511 

9,330 

34,105 

42,525 

22,827 


where m* is the ith moment, about the mean, of an ordinary Bernoullian variate 
with the same values n and p. 


The expected value of the reciprocal is 


(9) 





+ ^5»(» - DpV 8 



V 2 


, , 1 , 
+ • • • H— V 
n 


This equation is not suitable for the computation of E{l/x) to a satisfactory 
degree of approximation unless np is small, say less than 5 for most purposes. 
The number of terms necessary to obtain a computed value with four significant 
figures, for example, may be estimated to be approximately 8 y/npqj\ — q n ). 
Expressed as a function of q , E( 1/x) becomes 


( 10 ) 



1 _x—1 n 

_ s Q - Q 

1 — q n n — x + 1 


a series which may be convenient for small values of q. 

E( 1/x) may be expanded in a power series by Taylor’s Theorem. It may 
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also be expanded in a finite series of expected values of powers, either in E(z) r 
E(x 2 ), • • • or in E(x — c), E(x — c) 3 , • • • c being any positive constant. The 

second of these three series may be obtained by expanding and taking. 


second of these three series may be obtained by expanding-^1 — -j and taking. 

expected values, and the third by dividing out - = -v- -r and taking ex- 

35 C "1 \X c) 

pected values. For all three expansions, however, the terms become progres¬ 
sively more complicated and laborious to compute. A simpler and more con¬ 
venient series for actual computations may be obtained by expanding I/35 in a 
factorial series. 

4. Expansion of E(l/x) in a series of inverse factorials. It is easy to prove 
by induction that, x > 0, 

1 = 0! , 1! (t - 1) 1x1 

x x + 1 + (* + l)(x + 2) + (x+t)! 

1 j (t - 1)!*! ! D /tm \ 

+ •••+ (x + 0! + ,i) 

where * 

(12) R t (x) = tl(z - l)!/(x + 01 

is the remainder after the first t terms. This is, of course, an expansion in 
Beta functions. It is also a simple special case of the expansion of a function 
in a “faculty 8eries ,, or series of inverse factorials [3] with an exact expression 
for the remainder. 

Let 

Then, since 

,1A\ V 3:1 / n \ «“) 

(u> ~ (F+l)!\*/ p 9 - (i+w 

the expected value of (11) is 

E A\ _ 0!s, , list , . , (i - l)l«la< 

\x/ (n+1 )p (n + l)(n + 2)p 2 (w + l)!p‘ 


nj_8j (1 — g*) 
(n + t)!p‘ 


. U, I A . f 

(n +1 )p (n + l)(re + 2 )p* 


(i — l)!n!a< 


(n + 1)! p‘ 


+ '' + 1 + ~ Rt {x)P(x) ' 
When developed as infinite series, both (11) and (15) are convergent since the 
remainders R t (x) —> 0 as t —> °o. 

For computing purposes it is convenient to write 


(16) 

in which, since 


£ C) = s «*+*(*«(*)> 

(n + i — l\ p % g n ~ l 

s * = 8 *->-< • )r=r- 
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the following recursion relation exists between w, and «,_i 

_ (i - l)ln!a< (i - l)tt<_i - k/i . 

* (n + i)!p < (n + i)p ’ ’ 

(18 ' _ 1 - k 

Ul (n + 1 )p 

where 

(19) k = npg n /(l — ? n ) ^ np/(e np — 1). 

This reduces the computing of the Ui to a simple repetitive procedure. The 
computing is still simpler in those problems in which, for the degree of precision 
desired, k is negligible. 

An estimate of E(R t (x)) should be added to the sum in (16) to improve the 
approximation. To determine a suitable estimate, a lower bound for the ex¬ 
pected value of the remainders may be computed from one of the following 
inequalities: 




- »(i - 

\m 


x — m , (x 

_9 *T* ' 


(t - 1) lag! 
(x + t )! 


-1, 1 .,, i\ , m-\-t 

~ twi — t(t H tut, 

m m 


m 0 


which is maximized by setting m = {(t — l)w t _i — tui\/u t , whence 

(21) E(R,{x)) > <«?/{(< - l)n(_i - tu t }, t > 1. 

Also, since when m = E(x) 


(22) 2(x - m) (< ( g P(x) < 2(x - m)P(x) = 0, 

a simpler inequality is 

(23) E(R t (x)) > tu t ( 1 - q)/np. 

Further, if only the first c < n terms in (20) are taken, 

(24) *<»<*»>§ Wsf' PW -5" 


4 (t+l)q 

Dound may be computed from 


, (x — 1 )(n — x + l)p 

and = • x (V+ A - 0 -- 


tu t 

(26.1) 

1. ,1 

2 tu ‘ + 2 Vl 

(26.2) 

1 . ,2 ,1 

3 tui + r i + 6 v * 

(26.3) 

1 -.tu t + ’£(i-±) V ' 

3 *-i \x 3/ 

(26 .j) 
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the choice among which may be governed by computing convenience. Taken 
with (16), these inequalities provide lower and upper bounds for E(l/x). 


5. Examples. Two examples will serve to illustrate the factorial series (15). 


Example 1 

Computation of E( 1/x) for n — 100 and p = 0.1 


np = 10 k = .000,265,621 E(l) = .111,527 


t 

Binomial 
sum of t 
terms 

Sum of t 
terms 

Factorial 
series lower 
bounds* 

Upper 

bound** 

1 

.000,295 

.098,984 

.099,647 

.132,167 

2 

.001,107 

.108,675 

.109,006 (.111,034) 

.115,247 

3 

.003,071 

.110,548 

.110,752 (.111,313) 

.112,498 

4 

.007,039 

.111,082 

.111,223 (.111,381) 

.111,852 

5 

.013,813 

.111,280 

.111,385 (.111,452) 

.111,657 

6 

.023,743 

.111,370 

.111,452 (.111,478) 

.111,587 

7 

.036,442 

.111,416 

.111,483 (.111,489) 

.111,556 

8 

.050,796 

.111,444 

.111,500 (.111,497) 

.111,544 

9 

.065,287 

.111,461 

.111,509 (.111,503) 

.111,537 

10 

.078,474 

.111,472 

.111,514 (.111,508) 

.111,534 

11 

.089,372 

.111,481 

.111,518 (.111,511) 

.111,532 

12 

.097,604 

.111,487 

.111,520 

.111,530 

13 

.103,320 

.111,492 

.111,521 

.111,529 

14 

.106,985 

.111,495 

.111,523 

.111,529 

15 

.109,164 

.111,498 

.111,524 

.111,529 

16 

.110,369 

.111,501 

.111,524 

.111,528 

17 

.110,992 

.111,503 

.111,525 

.111,528 

18 

.111,294 

.111,505 

.111,525,4 

.111,527,5 

19 

.111,431 

.111,506 

.111,525,6 

.111,527,3 

20 

.111,489 

.111,508 

.111,525,8 

.111,527,1 


24 .111,526 


100 .111,527 (end of series) 

* Sum of t terms plus lower bound for E{R(x)) from (24) with c = 3. Num¬ 
bers in parentheses are calculated from (21). 

** Sum of t terms plus upper bound on E(R(x)) from (26.3). 


t 

1 

2 


Example 2 

Computation of E(t/x) for n — 1000 and p = 0.3 


np — 300 

Sum of t terms 

.003,330,003,330 

.003,341,081,185 


k = 9.7 X 10~ 14 

Factorial series upper and lower bounds* 

f.003,346,7 

\.003,341,0 (.003,341,155,4) 


* Computed as in Example 1. 
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t Sum of t term* 

3 .003,341,154,817 

.003,341,155,549 

5 .003,341,155,559 


Factorial serial upper and lower bounds* 
.003,341,211 
.003,341,155 

.003,341,156,29 
003,341,155,56 

[. 003,341,155,58 
003,341,155,57 


For the binomial series, the sum of the largest eight terms of (9), not the 
first eight terms, is approximately .0007 which is less than 1/4 of the 
value of E{l/x). 


In the first example the value of np is almost small enough to make computation 
by (9) convenient. In the second example about 120 terms of (9) must be com¬ 
puted to obtain an approximation to four significant figures but only four terms 
of the factorial series are needed to obtain seven significant figures. It is evi¬ 
dent that as np increases, the number of terms of (16) required to obtain an 
approximation to a given number of significant figures decreases. The opposite 
is true of (9) as n increases, or as p approaches a* value near 1/2. 


6. Extending the special condition. In some sampling problems all values 
of x less than a specified value, g, and greater than another specified value, h , 
are inadmissible. Then the probability of x in n is 

(27) P(x | g, h) = Q f q n - x /so. 0 , h , g < x < h, 

where 


(28) 


SQ,g,h 



p q 


With this new condition, E{l/x) is given by (15) if is replaced by 


(29) 


8i,g,h 


h /. | *\ x-H 

y (n + z\ p q 

~f* v &0,gA 


and the summation in the remainder term is from g to h. Also since 
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a recursion relation similar to (18) may be used in computing 
(31) 


(n + i )! p' 

= <1 ~ ~ (» ~ *)’ \Mk ± j ~ 1 ) 1 + W(* + i)) 

(n + i)p 


where 

(32) 

(33) 


= nlpV~ g+1 

* (n - fif)!8o,„,* 

, n! p h q n ~ h+1 

* (n — A) 1 so.»a 


The inequalities (20) to (23) inclusive and (26) are applicable to this extension 
on substitution of Ui, g ,h for U {. 

7. Expansion of E(x~ a ) in a factorial series. Equation (11) may be extended 
to other negative integral powers of x. If a is a positive integer 


(34) 

where 

(35) 


- 4™ - <£m + (,+roSrw 




X ,_1 x!P(x) 

Ri (*) - 2> h+w (x + ()!a-» 


j-i 


and the 6;., are the absolute values of the Stirling numbers of the first kind (see 
Table 2) formed by the recursion relation 

(36) bi,j - + (t — l)b,-i.j, bi,j = 0 if j > i or j < 1. 

It is evident that 

(37) 


(38) 

whence 


(39) 


h, i 


2Xi = i! 

i-i 

(f — 1)! and fo.j < i\ if j > 1, 


Bid) = 


<+ 1 


P(l) 


d^/_\. at + D! - t\)x\p(x) 

Rt{x) <-2(xT<)I-’ 


x > 1 


(t + 1 ) 


P(x). 
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Hence R[{x) — ♦ 0 and E(R[{x)) — > 0 as t —► » and the sum of the first t terms of 
(34) converges to E{x~ a ) as t —► qo . 

The following recursion relation corresponding to (18) provides a simple proce¬ 
dure for computing: 


(40) 


U it a = bi, a Ui/(i — 1 )! = 


^ (Wi_i, 0 /6 t _i, ft ) - Jfc/tl 

(n + l)p 


The computing procedure, then, follows a cycle of four simple operations: 

1. Divide {k/(i — 1)!} by t. 

2. Subtract the quotient from {w*-i,a/5*-i,a). 

3. Divide the difference by |(n + i + l)p} + P- The quotient is t u, a /bi , a . 

4. Multiply this quotient by 6<,« . 


♦ TABLE 2 


A bsolute values of Stirling numbers of the first kind , 5 t , y* 


\ 

\ i 
.\ 

1 \ 

1 

2 

3 

4 

5 

6 

1 

1 

0 

0 

0 

0 

0 

2 

1 

1 

0 

0 

0 

0 

3 

2 

3 

1 

0 

0 

0 

4 

6 

11 

6 

1 

0 

0 

5 

24 

50 

35 

10 

1 

0 

6 

120 

274 

225 

85 

15 

1 

7 

720 

1,704 

1,024 

735 

175 

21 

8 

5,040 

13,008 

13,132 j 

0,769 

1,960 

322 

9 

40,320 

109,584 

118,424 

67,284 

22,449 

4,536 

10 

302,880 

1,020,570 

1,172,700 

723,680 

209,325 

03,273 


* These numbers are also known as differential coefficients of zero [4]. 


The expressions in braces arc quantities obtained in the preceding cycle. 

The Ui, a may also be calculated from (18), or checked by such a calculation. 
A lower bound for E(R f (x)) after t terms may be calculated from the first c 
terms of 

E(R'(x)) - S R' t (x)P(x) > £ R[(x)P(x) 

x—1 

(41) 

x“~ i+l {x + t )! (n — x )! (1 — q n ) 


or from an inequality similar to (23) 

(«> *<«'<«» 



BERNOULLIAN VARIATE 


50 


which may also be written 


(43) 


E(R'(x)) > - (t _ W(B(z)) M {<*<*) + 0(*M + t - 1) • • • E(x) 

*+i 

- Z b l+ um?\. 


An upper bound may be calculated from 


(44) E{R'(x)) < £ b l+ i,j < i(t + l)u, 

(t ~ 1)! ;•i 

# 

or 

e no X^P(x) 

E(R\x )) < Z R'(x)P(x) + Z Z 

(t 4- 

*-l xm>c+l j-1 “T t)lC 

(45) < £ R'(x)P{x) + — ‘ £^=£ R'(x)P(x) 

*-l (i — lj! y-i c° *-l 

+ (« - l)!c“ +1 {(c + 0(c + <-])••• C - Z +1 


8. The positive hypergeometric variate. The theory of sampling without 
replacement from a finite population rests on the hypergeometric variate. Its 
probability function is 

(46) P(, | ff.M,»)-(")(" -?)/(»)' 

In applications to finite sampling, N is the number of items in the population, 
M is the number of them that are of a certain kind, n is the number of items 
drawn for the sample, and x is the number of items of the designated kind in the 
sample. 

As in the case of the Bernoullian variate, it is necessary to exclude zero in 
defining the expected value of 1/x. The probability function of the positive 
hypergeometric variate, then, is 

(47) Pir(x) = P(x | N, M y n)/s 0 , x > 0 
where 


(48) s 0 = 1 - P(0 | Ny My n). 

Throughout this section the notation will have reference to (47) instead of (1). 
The expected values of positive integral powers of x are 

E(x) = Mn/(N8 Q ) 


(49) 

(50) 


] (M(M -- l)n(n - 1) , Mn 


— n(n~-~Y) 


nJ 
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and, in general, 
(51) 


E(x‘) = L @jP(*!/(x - j)l) 
y-i 


where the ©J are the Stirling numbers of the second kind and 


(52) 




(<^) 


M\n\(N - J)\ 


(M — j)\(n — yjliV'Iso’ 
The factorial series corresponding to (16) is 


(53) 
where 

(54) 
and 

(55) 


E 


(;) - E £ P*(z) = t«. + £(«.(*)) 

W a: ,_i 


.. _ v (* _ 1)1*1 *> /_\ 

(z + *)! 


«(«!(*)) 


2 —Al! p„{x). 

(x + t)\ w 


The Ui may' be computed from 
(AT + IK 


Ml 


(56) 


(M + l)(n -f l)s 0 




N + 1 


(N - M) 1 (N - n )! 


So\(M + 1 ){n + 1 ) 
and the recursion relation 


(57) 
where 

(58) 


Ui 


N\(N - M 
(N + i)8 t 

(M + t)(n + t)a<-i 


n - 1)! (M 4- 1 )(n + 1) 


1 


s*- = 1 — P(£ \ N + i 9 M + i, n + i). 


The computing is quite simple in those instances in which 1 — s t is negligible. 

Corresponding to (26), an upper bound for the expected value of the re¬ 
mainders after t terms may be computed from 


tu t 


( 59 ) 


J tu t + £P#(i)/(£ + l) 

<< Ihj 4- ? ^jgljj 4- ^ ^*(%) 

E(R,(x)) < \tu, + 3 t + t + g ^ + ^ + 2 ) 

i<M« + <ig(l- l)p«(x) * ! 

U tA\x 3/ 


(* + <)!' 


(59.1) 

(59.2) 

(59.3) 
(89. j) 
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A lower bound for the expected value of the remainders may be computed 
from one of the following inequalities corresponding to (23), (21) and (24) 

(60) E(R,(x)) > tuiNso/(Mn) 

(61) E(R t (x)) > tu\/[{t - 1)mi_i - tu t \ 

(62) E(R t (x)) > t A?' P *W- 

*-i (x + t )! 


The expected values of other negative integral powers of the positive hyper¬ 
geometric variate may be calculated from 


(63) 
where 

(64) 


E(x~ a ) = g bi.aUi/d - 1)1 + E(R' t (x)) . 


R',(x) 


a 


2 bt+u 

7-1 


^xlPujx) 
x°(x + t )! 


With P H (x) substituted for P(x), (39), (42), (43), (44), and (45) provide lower 
and upper bounds for E(R[(x)) for the positive hypergeometric variate. Also, 
corresponding to (41) 


(65) 


E(Rj(x)) > ± R[(x)P ff (x). 


Xmm 1 


9. Variance and moments of 1/x and x~ a . The variance of 1/x, which is 
E( 1/x 2 ) — (E(l/x)) 2 , may be calculated from (16) and (34), with a = 2, for the 
positive Bernoullian variate, and from (53) and (63), with a = 2, for the positive 
hypergeometric variate. Likewise, the variance of x~ a and the moments of 
1/x and x~ a about E( 1/x) may be computed by the usual formulae. 
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RANDOM WALK IN THE PRESENCE OF ABSORBING BARRIERS 

M. Kac 

Cornell University 

1. Introduction. The problem of random walk (along a straight line) in the 
presence of absorbing barriers can be stated as follows: 

A particle, starting at the origin, moves in such a way that its displacements 
in consecutive time intervals, each of duration At, can be represented by inde¬ 
pendent random variables 

x 1 ,x 2 ,x 9 ,... 

Moreover, if at some time the total (cumulative) displacement becomes >p 
{p > 0) or < — q (q > 0) the particle gets absorbed. The problem is to deter¬ 
mine the probability that “the length of life’ 5 of the particle is greater than a 
given number t. This problem also admits an interpretation in terms of a game 
of chance in which the player quits when he loses more than q or wins more than 
p. An interesting paper on this type of problem by A. Wald 1 appeared recently 
in the Annals. Wald assumes that the X’s are identically distributed and that 
their mean and standard deviation are different from 0. 2 He is then mostly 
interested in the limiting case when both the mean and the standard deviation 
become small. The object of this paper is to propose a different method of 
attack which in some cases leads to an answer in closed form. The method we 
use has been employed repeatedly in statistical mechanics in the study of the 
so called order-disorder problem. It is due, I believe, to E. W. Montroll 3 . As 
far as the author knows this method was never used in connection with the 
classical probability theory and this seems to furnish an additional reason for 
publishing this paper. 

2. The simplest discrete case. We assume that each X is capable of assuming 
the values 1 and —1 each with probability and for simplicity sake we let 
At = 1. Note that, unlike in Wald’s case, the mean of X is 0. Denote by N 
the random variable which represents the “length of life” of the particle and 
let (m an integer) 


«(m) = q 


m — 1 or 
otherwise. 


m = — 1 , 


1 A. Wald “On cumulative sums of random variables,” Annals of Math. Stat., Vol. 15 
(1944), pp. 283-296. 

* Since this was written Professor Wald informed the author that he can easily avoid the 
condition that the mean should be zero. 

* See for instance E. W. Montroll, “Statistical Mechanics of nearest neighbor systems,” 
Jour . of Chem. Physics , Vol. 9 (1941), pp. 706-721. 

62 



RANDOM WALK 


63 


Clearly we have (throughout this section we assume that both p and q are 
integers) 

Prob. {N > n) = Prob. {-q < Xi < p, -q < X x + X 2 < p, • • • , -g 

< Xi + • • • + X b < p) = * • • 6(m n ), 

where the summation is extended over all integers mi , m 2 , • • • m n for which 
< wi < P, -g < wii + m 2 < p, • • • , -q < mi + m% + • • • + m* < p. 
Letting 

lj = q + mi + • • • + my, 0*' = h 2, • • • , n), 

we see that 

(1) Prob fiV > n) = 2] - q)Kk -h) ■■■ 6(ln - In- i). 

Let us now consider the (p + q + 1) by (p + q + 1) matrix 

0 i 0 0 0 • 

h o £ 0 0 • 

(2) A = ((6(i - fc))) = 0 i 0 i 0 • 


It is easily seen that the sum in (1) is equal to the sum of the elements in the 
(i q + l)-st column (or row) of the matrix A n . Thus 
Prob. {N > n\ = sum of the elements of the {q + l)-st column of A n . 
Denote by Xi, X 2 , • • • Xp+^+i the eigenvalues of the matrix A and let 

(*i'\ 4 S \ , Xpl«+ 1) 

be the normalized eigenvector of A belonging to the eigenvalue \j . It can be 
shown by elementary means 4 that 


= cos 


P + q + 2 


4 Matrices of type (2) have been introduced and studied in various connections. In a 
paper by R. P. Boas and the present author recently accepted by the Duke Mathematical 
Journal references to several authors are given. In order to find the eigenvalues and the 
eigenvectors of (2) it suffices to know that 


1 o 0 
ala 
0 a 1 a 
0 0 a 1 


m+l m+1 

Pi P 2 

pi — pa 


where m is the order of the matrix pi and pt roots of the equation p* — p + a* — 0. 
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and 




V2 


sin —j-j * • 

Vp + q + 2 p + g+ 2 

Denoting by # the orthogonal matrix 


rrjk 


*1™ 

X? 

~(1) 

• • • £p+fl+1 

Xi 2) 

*?> 

„<2) 

• * * £p+q+l 


*r 4+1) 

-Cp+g+1) 

P+g+1 


and by R f the transposed of R we have (since the eigenvalues of A are simple) 
by a well known theorem 

01 


xr 


R. 


^P+<H-1 J 

It thus follows by an easy computation that the sum of the elements of the 
(q + l)”St column (row) of A n is 


p-fg+1 p+c-fl p -fg-H /p -fg-f 1 \ 

i t s *;*&( 2 x<»). 

r-1 j-1 j-1 \ r-t / 


We have 

p+g+1 

i x«> 


V 2 


P+g+1 


sm 


• irjr 


Vp+S+ 2 « P + 9 + 2 

0, 

V2 


Vp + q + 2 COt 2(p + g+2) 




j even, 

„• *AA 


and therefore 5 
Prob. \M > n } 


*j(q +1) 

sm ——7——— cot, 




o p+g+i • 

— _ V* cos n __ _ 

p + q + 2 ^ p + g + 2 p + q + 2 2(p + g + 2) ’ 

where the star on the summation sign indicates that only odd j’s are taken under 
account. 

The method just illustrated is quite general but in more complicated cases 
the job of finding the eigenvalues and eigenvectors becomes formidable. 


8 Professor Feller has called the author’s attention to the fact that similar problems and 
formulas can be found in Chapter III of W. Burnside’s Theory of Probability (Cambridge, 
1928). He also pointed out that the problem could be treated by means of Markoff chains. 
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Professor G. E. Uhlenbeck has pointed out that our formula implies a known 
result from the theory of Brownian motion. 

Consider a free Brownian particle which at / = 0 is at x = x 0 (xo >0). R. ‘ 
Ftlrth 6 has shown that the probability that between t and t + dt the particle 
will be either at x = 0 or at x — d (0 < x 0 < d) for the first time, is given by the 
formula 


dt~£i (2 m + sin J 2wi + 1 )^°, 

dr m— o a 

where D is the “coefficient of diffusion.” 

If we treat the one-dimensional Brownian motion as a random walk with steps 
=fcAx, each move lasting At, the probability that a particle starting from x<> will 
not have reached 0 or d in the time interval (0, t) can be calculated by means 
of our formula. 

We must only put q — x 0 /Ax, p = {d — x 0 )/Ax, n = t/At and assume that as 
both Ax and At approach 0 the ratio {Ax) 2 /2At approaches the “coefficient of 
diffusion” D. 

An elementary computation shows that in this limit the Prob. {N > t/At} 
approaches 


7T j 




and that the differential of this expression (with a minus sign) gives exactly 
Fiirth’s expression. 


3. General theory in the continuous case. We now assume that the distribu¬ 
tion function of X possesses a continuous and even density function p(x). We 
have 

Prob. {# > n\ = J ••• J p(xi) • • • p(x„) dx i • dx*, 
o 

where the region of integration il is defined by the inequalities 

—q < Xi < P, —q < xi + x 2 < p, • * • , —g < xi + • • • + x w < p 

Introducing the new variables ... 

Vi — g + x i + * * * + * i » 0 = 2, • • • , n) y 

we see that the Jacobian of the transformation is 1 and 


Prob. {N > n] 

(3) j»P+fl pP+Q 

^ Jo "1 P ^ Vl “ q ^ P ^ 2 “ p ^ Vn 

Consider the symmetric integral equation 

(4) j p(s - t)f (t) dt = */(») 


yn-i) dy dy. 


* Ann. d. Phyt. 53 (1917) p. 177. 
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and note that if K n (s, t) denotes the n-th iterated kernel of this integral equation, 
the right side of (3) is equal to 

K n (q , t) dt. 

Thus 

t*P+Q 

Prob. {AT > n} - / K n (q, i) dt. 

From the general theory of integral equations we know that 



k*(8, o = e Kmm, («> 2 ), 

y-i 

where Xi, X 2 , • • • are eigenvalues and • • • normalized eigenfunctions 

of the integral equation (4). 

Since p was assumed to be continuous it follows that the eigenfunctions are 
continuous and 

00 

Prob. {AT > n) = E K/M / m dt. 

- 7-1 JO 

This formula is very general and provides, in a sense, a complete solution of the 
problem in the continuous and symmetric case. Unfortunately the usefulness 
of this formula is limited by the difficulties encountered in solving integral 
equations of the type (4). 

In fact, the integral equation 

to which one is led by considering the normally distributed X’s, appears to be 
very difficult to solve. 
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and differentiating twice with respect to s we obtain the differential equation 

rw + (x “ r ) /(#) " °- 

Substituting the general solution of this equation in (6) we find in an entirely 
elementary fashion that 

x ' “ iT?,’ 

sin y,t + % cos 

3A) Vi + Kp + aKi + yf)’ 

where yj is the jth (positive) root of the transcendental equation 


(7) 

We have 


tan (p + q)y = - 


2 y 


1 - y 1 


f 


in y s t + yj cos p,<) dt = - (1 — cos (p + q)y t + f/y sin (p + q)yj\ 
Vi 


and it is easily seen that (7) implies 


1 - cos (p + q)y s + yj sin (p + q)yj = 


Finally, 


Prob. {N > nj = 2E' 


0 if cos (p + q)y t = * — 

1+Vi 

1 _ 

2 if cos (p + q)Vj - — t . 

1 + Vi 


sin y, g + yj cos yyg 


H (i + yT Vi {f + Up + «)( l + y*>} ’ 

where the dash on the summation sign indicates that only those i’s are taken 
under account for which 


cos (p + q)yj = - 


1 - yj 
l + yT 


We omit here the discussion of various limiting cases inasmuch as our main 
purpose was to obtain exact formulas. 

There are indications that some of the limiting cases are related to singular 
integral equations with continuous spectra. We may return to this subject 
at a later date. 



ON THE CLASSIFICATION OF OBSERVATION DATA 
INTO DISTINCT GROUPS 

By R. v. Mises 
Harvard University 

Introduction. In scholastic examinations as well as in the examination of 
industrial products the following probability problem arises. The individuals 
of a certain population are successively subjected to trials each of which leads 
to a definite score x (one real number or a group of m real numbers). Each 
individual is supposed to belong to one of n classes. These classes are character¬ 
ised by n probability densities p i(x), P 2 (x) y • • * p n (x). One has to decide on 
the basis of the observed value x to which class the respective individual belongs 
and one wishes to make this decision with the smallest possible risk of failure. 

For example, let us consider an examination where the three grades A, B, C 
are attributed on the basis of a simple score x (case m — 1, n = 3). It may be 
assumed that an individual of the class A has a mean expected value of x equal 
to — 75 and a normal distribution with the standard deviation <n — 4/y/2. 
The analogous values for the classes B and C may be d 2 = 50, a 2 = 8/\/2 and 

= 25, as — 12/V2- In this case, the solution developed in the present paper 
allows the conclusion that the best way of grading would be to attribute the 
grade A to scores x beyond 70.0, the grade C to scores below 40.0 and B to the 
rest.'. Thb corresponding error risk will be 3.9% or the success rate 0.961. 

There exists, of course, one case where the solution is trivial. If the probability 
densities p„(x) are limited to n non-overlapping regions R y (with p v = 0 at points 
outside R P ) an obvious decision can be made without any risk of failure. An 
assumption of this kind underlies the usual procedure of grading. If, in the 
foregoing example, an individual of class A is supposed to have at any rate a score 
beyond 60 and a class C individual less than 40, it is obvious how the grades 
should be attributed without incurring any risk. It seems, however, that in 
many problems the assumption of normal distributions or some other kind of 
overlapping distributions is more appropriate. Then, the probability problem 
has to be solved. 

The solution submitted in the present paper is derived from the simplest 
principles of calculus of probability without any arbitrary assumption or hypothe¬ 
sis. " If n equals 2, the problem can also be considered as a problem of testing 
a simple statistical hypothesis with a two-valued parameter. 1 It has been 
shown man earlier paper 2 that under this restriction success rates higher than 
50% are obtainable. 


l See A. Wald, Annals of Math. Slat., Vol. 15 (1944), p. 145. Here, both p x (x) and p 2 ( x ) 
are supposed to be normal distributions with the same covariance matrix. The problem 
treated by Wald is different from the one considered in the present paper since in Wald’s 
paper the parameters of the two multivariate normal distributions are assumed to be 
unknown. 

*R. v. Mises, Annals of Math . Stat., Vol. 14 (1943), p. 238. 
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1. Statement of the problem. For each of n classes of individuals a prob¬ 
ability density p P (x) t v = 1, 2, •»• n, is given. We subdivide the tnrdimensional 
x-space into n regions and assign the region R v to the vth class. 

The probability, for an individual of this class, to have its x-value falling in 
R v is 

(1) Pp - [ Pp(x) dX , v - 1, 2, • • • n 

where dX denotes the element of the rc-space (dX = dx in the case m = 1). 
In the N first trials of the indefinite sequence of trials, N r individuals that 
belong to the vth. class will be tested. Out of these only those individuals whose 
x-value falls in R v will be ascribed to the vth class. Their number according 
to the definition of probability, equals N V (P< + c„) where €„ tends towards zero 
as Np goes to infinity. The total number of correct decisions during the N first 
trials is therefore 

(2) N 1 (P 1 + ci) + N 2 (P 2 + *)••• N n (P n + e») 
and the relative number is 

(2') (Pi + ft) + jj? (P, + *.)+••• p (Pn + <«)• 

If N increases indefinitely a part of the N„ must become infinite. For these 
classes, €„ converges toward zero. For the other classes N„/N diminishes to 
zero. Thus, the relative number of right decisions converges towards 

(3) ^(WiPi + WsP* + N n P n ). 

The Np are unknown. Every one of these unknowns can take each value from 
zero to N . If P M is the smallest P v , the most unfavorable case, where the 
expression (3) has its smallest value, will occur with — N, all other N, being 
zero. This value is obviously P» . Thus it is seen that the frequency of correct 
assignments is at least equal to the smallest P P which may be written as P„,i n . 
The greatest risk of making a false decision is 1 — P in i n . 

Now the problem to be solved in the present paper can be stated as follows: 
For n given densities p v (: r), find the subdivision of the x-space into n regions R p 
that gives to the smallest of the expressions P v defined in (1) its possibly greatest 
value. 

This problem has the type of a continuous variation problem with the integrals 
in question bounded within the limits zero to one. We may, therefore, assume 
that under reasonable restrictions for p v {x) a solution exists. Uniqueness of 
the solution cannot be expected in general. It seems very difficult to establish 
the conditions for unicity in other than the most simple cases. Existence of 
more than one solution would mean that each of them is an optimum with 
respect to infinitesimal modifications of the boundaries. 
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2. General solution. A simple problem of variation is considered as solved 
in principle when the nature of the extremals is known. In our case of a so- 
called minimax problem, where the minimum of n quantities is maximized, an 
additional relation between the n integrals is required. Both can easily be 
found in the actual case. 

Let us first consider a partition of the x-space into n regions with not all P, 
being equal. The smallest P „ will be called P m i n and the smallest but one P*. 
Among the k regions for which P, = P m i n there will be at least one, say, R a that 
has a common border with a region Rp whose P-value is greater, so that Pp ^ 
P*. Now modify the boundary between R a and Rp in such a way that the space 
covered by R a is increased and that of Rp decreased. According to (1) the new 
values of P a and Pp will be 

(4) P' a « P a + A, P'p = Pp- A' 

with both A and A' positive. The two quantities A and A' are not independent 
of one another, but they can be chosen both smaller than any given positive 
number c. Therefore, the condition 

(5) P' a - P a + A < Pp - A' = P'p 

can be fulfilled. All other P,-values remain unchanged. 

In the case k — 1, that is, if only one region R v had originally the minimum 
P-value, the modified system has a greater minimum P, which equals either 
P a + A or P*. If k > 1 the new system has the same minimum P as the original 
one, but its k- value is diminished by one. If we repeat the same procedure 
(k — 1) times we obtain a system of regions with one single P, having the mini¬ 
mum P-value and the next step leads to a partition of the x-space into n regions 
with a smallest P-value that is greater than the original P m m. Thus it is seen 
that no partition with unequal P,,-values can solve our problem. 

Secondly, if m > 1, consider a system of n regions with P = Pi = P 2 = • • • = 
P n . Take two points, x and y, on the border of any two neighboring regions 
R, and P M . An infinitesimal variation of the boundary would consist of adding 
to R p in the neighborhood of the point x a space element 6S subtracting it from 
P M and, at the same time, adding to in the vicinity of y an element 8S' sub¬ 
tracting it from R v . Then, according to (1), the new values of P p and P M will be 

(6) P', = P + pXx)8S - pp(y)6S' 

P' = P — p»(x)5S + Pv(y)8S'. 

Introducing A, = P f , — P and A„ = P^ — P, these equations solved for 8S and 
8S' give 

(7) SS = 5S' = + P»(s)A, 


where 

(70 


D = p,(x)p„(y) - p„(x)p,(y). 
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If the determinant D is positive, we find two positive quantities 6 S and US' 
for any pair of positive A M and A,. If D is negative the same is true when x and 
y are interchanged. In both cases, that is, with D 7 * 0, the original partition is 
replaced by a new system of regions in which only two regions, R v and , have 
increased P-values, while (if n > 2) still P m i n = P. If to this system the pro¬ 
cedure as described in the foregoing is applied, a final partition with a greater 
minimum value of P can be derived. The conclusion is that no solution of our 
problem can include a boundary on which the determinant D is different from 
zero for any two points x and y. On the other hand, it is seen that D = 0 means 
that the ratio p,(x):pp(x) has a constant value along the border. Thus the 
result is reached: 

The partition of the x-space that solves our problem is characterized by two proper¬ 
ties: (1) for all n regions R r the value of P v is the same; (2) along the border between 
R v and R^ the ratio p,(x)/p M (x) is constant . 

In the one-dimensional case (m = 1) only the first of these two statements is 
relevant. In any case, the success rate, that is, the guaranteed ratio of correct 
decisions, equals the common value of all P„. 

3, Illustrations, (a) Onc-dimensional case. Upon introducing the cumula¬ 
tive distribution functions 


( 8 ) 



p v ( z)dz 


the conditions Pi = P2 = • • * P« take the form 

(9) Fi(Xi) = F 2 (x 2 ) - F 2 (x 1 ) = • • • = Fn-l(Xn-l) “ F n—l(x n — 2 ) = 1 — Pn(Xn-l) 

where x\ , x 2 , • • • x„_i determine the n intervals on the both-sides infinite x-axis* 
If all density functions have the same form except for an affine transformation* 
one has 

(10) F v (x) - F[hy(x - *,)], * = 1, 2, • n 

Let us assume, for instance, that scores between 0 and 100 are attributed to 
three types of individuals. The first type may have an even chance to obtain a 
score between 0 and 50, the second between 40 and 80 and the third between 
70 and 100. Here 


(ID 


Fy{x) «* + (*- t 


| x — | ^ 


2 p P 


with #y = 25, 60, 85 and p y = ^ tu, The conditions (9) supply 
( 12 ) 


1 , xi-25 1 N _ 1 x* - 85 

2 + -Mr " 56 " Xl) = 2 " 


and this, solved for Xi , x 2 gives = 41 f, x 2 = 75 while the three expressions 
(12) take the value 0.833. Therefore, in attributing all scores below 41f to the 
first class and all scores beyond 75 to the third one is safe to make under no 
circumstances more than $ incorrect decisions in the long run. 
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In the example quoted in the introduction one has 


p,(x) « 


(*—«>,) s I2ff l 


with = 75, 50, 25 and cl 


7 ~ <r, V2i * 

8, 32, 72. If $(x) denotes the integral 


* (x) = 

the conditions (9) become 

(14) 1 + _ .(itr®) - . i - 

The first and last expression equated lead to Xi + 3 x 2 = 250. The complete 
solution can be found with the help of tables for $. It is Xi = 29.9920, x 2 == 
70.0027 with the common value twice 0.961 for the three expressions (14). 
Hence the result as quoted in the introduction. 

Let us now take up the case of six normal distributions with equidistant 
mean values = ±a, ±3a, ±5a and one and the same variance c 2 . Then, 
because of symmetry, two equations only have to be fulfilled: 

■ + •(■$?) - •osr) - - •050 - 

For <r 2 /a* = 0.32, the numerical solution gives 

xi = —4.160a, x 2 = —2.062a. 


The success rate, i.e. half the common value of the above expressions is 0.931 • 
The six intervals extend from — qo to X \, from X\ to x 2 , from x 2 to 0, from 0 to 
—x 2 , from —x 2 to —Xi, and from — X\ to oo. 

(b) Case of more than one dimension. Let us assume that two classes A and 
B have uniform distributions extending over volumes Vi = 1/pi and V 2 = l/p 2 
respectively. If the two regions have a common part of volume V each surface 
within the common space fulfills the condition pi/p 2 = constant. Thus, the 
two regions Ri and R 2 are not uniquely determined but subject to one condition 
only which determines the optimum success rate. If kV is cut out from Vi and 
(1 — k)V from V 2 , the relation must be fulfilled: 

1 - p, Vk = 1 - p t V(1 - k), i.e. k = JBllL 

Pi + P2 

and the success rate is 

S = 1 - Pi Vk = 1 - Ml = 1 - Jh V(1 - k). 

Pi + Pi 

If three classes A, B , and C are considered with the densities pi = 1/Vi , p 2 *= 
1/Fi, pa = 1/Fa and the first two regions have a space of volume F in common, 
the latter two a space of volume V', the conditions are 

1 ~ PiF(l - k) = 1 - P2 (kV + XFO = 1 - pa(l ~ A)F' 
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which supply 

K = 1 - fr + P» V + V 

PiPj + PjPj + PsPx V ’ 

x = j_ PiPiJh V + V' 

Pi Ps + p* Ps + Pa Pi V' 
and the success rate has the value 


8 = 1 - (F + F') PlP2Pa 

Pi P2 + ?>2 ?>3 + p8 Pi 

If the are normal density functions, say 


p> (*, y) 


Vd, 




Q, = a, (:r - a,) 2 + 2j8, (x — a,) (?/ - b,) + y, (y — b,) a 

and D P the corresponding determinants, the curves separating the regions R> 
are the conics 


Q» — == const. 

where the constants are determined by the conditions that all P y must be equal* 
If the a, 0, 7 have the same values for every v, the borders consist of straight 
lines. In this case one can reduce the expressions for , by an affine transforma¬ 
tion, to 

p,(x,y) = - 

7T 

In the transformed plane the borderline between the regions R v and R M is per¬ 
pendicular to the straight line that connects the points A y {a y , b v ) and A M (a M , 
h M ). If all points A, lie on the same straight line (in particular, if n = 2) the 
whole problem is practically identical with the one-dimensional (m = 1). In 
the case n — 3, in general, the three regions are confined by three lines per¬ 
pendicular to A iA 2 , A 2 A 3 , A 3.4 1 passing through a point C whose coordinates 
are determined by the equations Pi = P 2 = P 3 . If r v denotes the distance 
ApC and <p>, are the angles, A „ C forms with the adjacent sides of the triangle 
A iA 2 A s one has to use the function 

1 r°° 

F(r,<pi) = 2\/^r Jo ^ “ z tan ^ e ~ S * ** • 

Then the two conditions for C read 

Fix 1 > ^1) + F(n » #0 — F(r 2 , <&) + F(r 2 , # 2 ) = /' T (r*, ^3) + F(r*, tf*) 
and the success rate equals 0.5 plus the common value of these three expressions. 



ON AN EXTENSION OF THE CONCEPT OF MOMENT WITH APPLICA¬ 
TIONS TO MEASURES OF VARIABILITY, GENERAL 
SIMILARITY, AND OVERLAPPING 1 

Milton da Silva Rodrigues 
State University of Sdo Paulo 

1. Introduction. Given a frequency distribution D: [X,, F ; ] (i = 
1, 2, 3, • • • , n), we shall call the expression 

Mr (D, Xj) = £ (X« - Xj) r Fi 

t-1 

the rth total moment of D about the origin X 3 . We shall consider the weighted 
sum 

= 2jWjM r (D } Xj) 

where W 3 denotes the weight corresponding to the particular origin Xj , and the 
summation is over a field <£. In particular, if <f> is the set of all values assumed 
in D by the variate Xi , and if W 3 = F jt we shall call the quantity the rth com¬ 
plete total moment of D. If, on the contrary, W 3 is the frequency F\ of the value 
X\ in a second frequency distribution D': [. X \, Fj] and <£' is the set of all values 
assumed by the variate X ] in D f , will be called the rth aggregate moment 
of D and D'. A modification of this procedure leads to what we shall call the 
moment of transvariation of D and D r . 

The consideration of complete moments draws attention to certain previously 
known measures of variability which are independent of the origin selected, 
and also provides simple methods of computation which are useful for data 
given in the form of a frequency distribution. The investigation of aggregate 
moments and moments of transvariation gives rise to certain measures of general 
similarity between two distributions, as well as measures of the amount of over¬ 
lapping. 

2. Sliding and complete moments of a frequency distribution. 

2.1. We shall give the name sliding total moments of order r to the successive 
values, for particular values of j, of the expression 

(2.11) Mr ( Xf ) = Fj i; [(Xj - XjY Fj]. 


1 The Portuguese original of this paper was written in Brazil, in August 1943. ItB transla¬ 
tion into English was entirely revised by Dr. T. Greville, Bureau of the Census, who pro¬ 
posed also many simplifications in the derivation of formulae. For his painstaking labor 
and interest I wish to express my very sincere appreciation. I also wish to thank Dr. 
W. Edwards Deming for reading the manuscript and making several valuable suggestions. 
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The expression for the complete total moment, written out in full, is 
(2.12) = £ Mr c X ,) = {(X< - X,y Fi FJ. 

j -1 »-1 y-i 

It is readily seen that the complete moment is independent of the choice of 
origin. 

2.2. If r = 0, we have 

M„(Xj) = Fj £ F t . 

*-l 

The complete total moment of order zero will therefore be 

(2.21) 9Wo = £ Fi E Pi = Ml 

i-i «'-i 

where M 0 stands for the total moment of order zero about the origin of the X', 
that is, 

Mo = Nv'o. 

2.3. If r = 1, we shall have 

Mx(X,) = Fj £ [(X.- - Xj )FJ. 

l-l 

Using Mi to denote the total moment of order one about the origin of the X , 
we obtain 

Mi (Xj) = Fj £ XiF t - XjFj £ Fj = FjMi - XjF s M 0 . 

i i 


Making.? vary from 1 to n and summing, we have 


(2.31) 


= 


£ FjMi - £ XjFjM, 

7-1 7-1 


= M 0 Mi - Mxilfo = 0. 


This result is due to the fact that we took the deviations X, — Xj with their 
proper signs. We may, however, calculate the value which the complete moment 
of first order would have if using absolute values. Thus, the sliding total 
moment thus modified becomes 


| Mj(Xj) \ = Fj [£ (Xj - Xj)Fj + £ (Xj - 

L*-i •-/ J 

which may be put in the form 

(2.32) | Mj(Xj) | = FjXj [g Fj - £ F<J - Fj [g F.X. - £ F, J. 
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Summing with respect to j and employing the substitutions 


(2.33) 


£ Fi - Mo - 2 Fi 
£ F<Xi - M 1 - £ FiXt 


gives for the complete total moment 

(2.34) | SR, | - 2 g |V y X, jg F, J - 2 £ |V, |j F< X,] . 


The quotient 
(2.35) 


7711 


|2R. 


9Ko 


of the complete total moment of order one by the complete total moment of order 
zero we shall call the complete unit moment of order one, or simply the complete 
moment of order one, when no confusion would result. 

The complete unit moment is a measure of variability, identical with that 
already considered by Andrae and Helmert, respectively in 1869 and in 1876, 
and which C. Gini, in 1912, called mean difference with repetition. 2 

The numerator of mi is easily computed if we observe that the upper limit j — 1 
of the F{ summation, for example, means that each product XjFj must be multi¬ 
plied by the cumulative frequency corresponding to the class immediately pre¬ 
ceding. We only have to shift the cumulative frequencies column by one class 
in the proper direction; the second term is similarly dealt with. 


2.4. The second order sliding total moment is 

MAXj) = F, £ [(Xt - Xj) 2 F,\ = F,M, - 2FjXjMi + F,X*,M 9 

teal 

where M 2 is the total moment of order two. Bumming with respect to j gives 
the complete total moment of order two 


(2.41) SRi = £ M£X,) = 2(M t M„ - AT*). 

y-»i 

The complete unit moment of order two is therefore 


(2.42) 



= 204 - V ?) 


*Apud Czubee, Wahrscheinlichkeitsrechnung, Vol. 2, (1932), p. 316. C. Gini, Varia¬ 
bility e Mutability , Cagliari, 1912. 
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where v' stands for a unit moment about the origin of the X, namely 

/ 2X f F 
Vr 2F 1 

Wa is also a measure of variability, independent of the choice of origin. It is 
equal to the square of Gauss’s “Prazisionsmass”, and to the double of Fisher’s 
variance; like m% it was defined by Andrae and Helmert, and was called byGini 
the mean square difference with repetition. 

2.5. If r = 3 we have for the sliding moments, 

= F, Z (X< - Xj)*F { 

t-1 

= FjM» - 3 FjXjMt + ZFjX)Mi - FjX*Mo. 

Summation over j gives 

(2.51) fDl, = Z M t (Xi) = MoM* - 3 M 1 M, + 3M t M t - M»Mo = 0, 

7-1 

a result which is easily shown to hold for any complete moment of odd order. 
We may calculate the value of the complete moment of order three using absolute 
values of the deviations X t — X ; by a process similar to that previously described 
for the calculation of | SDti | . This gives 

|9«, = 2 |~Z FjX) Z Fi - 3 Z FjX) Z FiXi 

L/-i »-i i »-1 

(2.52) L B ,._ 1 „ - 

+ 3 Z FjXj'ZFiXl - Z FiXi . 

7-1 1 7-1 t-1 J 


2.6. The sliding moments of order four are 

M a (X 3 ) = FjMt - 4FjXjMa + §F jX)M* - 4FjX*Mi + FjX+Mo. 
Summing with respect to j and simplifying, we have 
(2.61) ®? 4 = M 0 M< ~ 47lfiM 3 + ml - + M 4 M 0 

= 2 (M 0 M 4 - 4 MiM* + ml). 1 


Dividing both sides by 2D? 0 in order to obtain the complete moment on a unit 
basis, we have 


- - 2 [§ - *E® + 3 (®)’] - 2 - ** + 3 *’>- 


But, if v indicates a moment about the mean 

Vi = vi — 4:Viv% + 6v'i 2 v' 2 — 3»»i 4 . 
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By substitution, therefore 

m 4 = 2(v 4 + 3 v ' 2 2 — 6v( 2 *4 + 3v( 4 ) 

(2.62) = 2[v 4 + 3(^2 - p'i) 2 ] 

= 2(i/ 4 + 3 ^ 2 ). 

This complete moment gives rise to a measure of kurtosis independent of the 
choice of origin 

rrg n 3 

ml 2v\ 2 * 

In case of mesokurtosis this reduces to 3, since for the normal curve v*/v\ = 3; 
leptokurtosis and platikurtosis occur for the same ranges as in the case of Pear¬ 
son^ measure $ 2 . 

3. Aggregate moments of two frequency distributions. 

3.1. Given two frequency distributions, D:[Xi , Fi](i — 1, 2, 3, • • • , ft) and 
D': [X'j , Fj](j = 1, 2, 3, • • • , p ) and a fixed point X] belonging to the second 
distribution, we shall call the expression 

(3.11) Mr(D, X-) = F\ t, (Xi - X'iYFi 

t-1 

the rth aggregate sliding total moment of the first distribution about the element 
X] of the second. Summation over j gives 

(3.12) •SWr « £ E F'(X, - X'y Fi. 

i*=l »-l 

We shall call c SQ? r the aggregate complete total moment or, simply, the aggregate 
total moment of D about D'. It is clear that this is a symmetric function of the 
two distributions, except for a change of sign in the case of odd moments. 


3.2. If r = 0, we have 

(3.21) Mo (D, X-) = F'j jr Ft 

*«1 

(3.22) m = £ F'j 52 Fi = MoM' a . 

i-i >-i 


3.3. If r = 1, we have 


(3.31) Mi(D, Xj) = F’j Mi - FjXjMo 

(3.32) e a»i = Mi Mo - MoM[. 


We shall call the quotient 

(3.33) 


mi = 


m 



EXTENSION OF MOMENT CONCEPT 


7# 


the aggregate unit moment of order r (or the aggregate moment coefficient), 
or simply the aggregate moment of order r whenever the simpler name will not 
cause confusion. 

It is obvious that the aggregate moments are measures of general similarity, 
as to form and position, between D and D'. This similarity will be an identity 
in case the two distributions coincide perfectly; on the other hand, it is clear that 
there is no limit to the degree of non-similarity which may be encountered. We 
shall take unity to represent the maximum and zero the minimum of similarity, 
and thus define a provisional similarity index 


(3.34) 

But 


5 = 


mi mi 
c 2 ‘ 

mi 


mi 


Ah Mo - MoMl 
Mo Mo 


= A — A' 


where A and A' stand for the arithmetic means of D and D', respectively. Now 
it will be seen that if A = A', S = <x>. This result is due to the fact that in the 
calculation of mi and mi we took the absolute values of the deviations Xi — Xj , 
while in the calculation of c mi we retained the algebraic signs. In order to make 
the two terms of the fraction in (3.34) comparable, we can either: 1) calculate 
e mi also using absolute values; or 2) take only the positive or only the negative 
part of both numerator and denominator of S. In any case, A = A' is a neces¬ 
sary condition for the maximum of S . 


3.4. We shall employ the first method suggested above, although we shall 
return to the second in the third part of the paper. As long as D and D' do not 
overlap, all the Xi — X] deviations have the same sign and this is the same as 
that of the difference A — A'. If, however, there is some overlapping this will 
not be the case, some deviations having different signs from that oi A — A '. 
This brings us to Gini’s concept of “trans variation”. He applies this term to 
any deviation Xi — X\ which does not have the same sign as X — X', these 
symbols denoting averages of any previously specified type; and he calls the 
magnitude of the deviation its “intensity”. 

In computing the complete moment of the first order using absolute values, 
in order to simplify the algebra we shall assume the same origin for X and X ' 
and therefore drop the stroke from the X, but not of course from the F. 
If certain values of X occur in one distribution and not in the other, we can 
merely consider the frequency as zero in the second distribution. In this way 
the two distributions can be regarded as extending over the same total range. 
If Xi and X m denote the extreme values, the sliding total moment is 

| M 1 (D, X,) | = V\ rg (Xj - X t )Fi + £,(X i - 

L<-i J 
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Summing with respect to j and at the same time employing the substitutions 
(2.33) or their transposed form, we obtain the following alternative expressions 
for the complete aggregate moment: 

(3.41) | e 2»i | = MiM'o - Mo Mi + 2 £ [V'X, Z F,J - 2 Z £f,' £ F,X<J 

(3.42) |*gRi| = MoM'i-MiM'o -2 Z 1"^,'X, Z F<1 + 2 Z fa ZM 

7-1 L <-? J 7-1 L *-J 

Note the similarity of the first of these forms to formula (2.34) which is in fact 
a particular case of formula (3.41). Alternatively, we may obtain from formula 
(3.42) the particular case 

(2.34a) 1| = 2 £ faZ F.X.) - 2 £ fa, X, £ F«) 

which is equivalent to (2.34). 

If the two distributions do not overlap, | c SD?i | does not differ numerically 
from c 9Wi. Let us consider the case in which there is actual overlapping, the 
range of non-zero frequencies extending from Xi to X n+P for D and from X n +i to 
X m for D f . Then formula (3.42) becomes, upon merely dropping all vanishing 
terms 


|*SKi| = MoM[ - MiM'o 


(3.43) 


n+ p r" j—1 “1 n+p [“ n+p 

- 2 £ FjXj Z F, + 2 Z F' Z F, X,- . 

jmmn+1 L immn+1 J /—n+1 L »—i 


On the other hand, formula (3.41) reduces, under the same circumstances, to a 
much less simple expression, which upon making the substitutions (2.33) and 
simplifying reduces to 

|ml = MoM[ - MiM'o + 2 Z \r>Xi 2 Fil 

7-n+l L »— n-fl J 

n+p I" n+p “1 

(3.44) - 2 Z F,' Z F<X< 

7-n+l L Iwmj J 

n+p n+p »+p n+p 

-2 Z F■ Xj Z F» + 2 Z F; Z F.X,.. 

7— n+1 »»n+L 7—n+1 »—n+1 


This result may be arrived at somewhat more easily by merely making the sub¬ 
stitutions (2.33) directly in formula (3.43). It may be noted that formula 
(3.44) at once reduces to the form (2.34) if the two distributions are identical, 
since the additional terms all cancel. It is, however a less satisfactory result 
than formula (3.43) because of the larger number of terms it contains. In order 
to obtain a formula which resembles (2.34) more closely, we may reverse the 
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order of summation in formula (3.43). Observing that the terms for j «* t 
collectively vanish, we see that 


|m| = MoM'i - MiMl 

(3,45) -2 if ["^ £ F’iXj 1+2 Sf I™ 2 f;1. 

i-n+l L J-n+1 J L J-S-l i 


It will be seen that the simple method of numerical computation described in 
section 2.8 is immediately applicable to all the formulas (3.41) to (3.45). Di¬ 
viding any of these expressions by C 3K 0 gives | c rri\ | . For example, if formula 
(3.43) is used, we have 


(3.46) \ c mi \ = A' - A 


Mo Mo 

Substituting this value in equation (3.34), we have 


n+p r n+p "1 n+p T n+p 

E F'iXi Z F< F<x< • 

i-n+i L i J L »—/ J) 


(3.47) 




mi mi 

I e mi | 2 


a quantity which we shall call the “mean coefficient of similarity. ,, 

We now observe that Si is a general measure of similarity whose magnitude 
is affected by differences in either form or position. It may, however, be de¬ 
sirable to eliminate the position element, in order to isolate the form aspect. 
To do this it will suffice to relate the value which | c mi | would have for A = A', 
to the product mimi. This value of | c mi | is, in fact, its minimum; denoting 
it by Vi we obtain the index 


(3.48) 


@i 


mi m[ 



which we shall call the mean similarity ratio. 

It is clear that all the above mentioned indices measure overlapping as well 
as similarity. Overlapping between two distributions will be greatest when 
their similarity is greatest, or when | c mi | is a minimum. In order to bring 
out more clearly the overlapping aspect we may follow Gini’s procedure of con¬ 
trasting the actual value of a measure with its maximum value. As already 
pointed out, if the form of the two distributions is held constant, but their rela¬ 
tive position is varied, the degree of overlapping, as measured by the mean simi¬ 
larity ratio, is greatest when the arithmetic means coincide. This method of 
procedure is embodied in the index 

(3.49) Zi = 

v e mi 

which we shall call the “intensity of transvariation or overlapping.” To calcu¬ 
late Vi we may, for example, merely add the difference A f — A « c to the X 
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values, in order to move D along the X-axis a distance of c, and then proceed to 
calculate | Vii | in the usual manner from the adjusted X values. 

3.5. If, in (3.11), r = 2, we have 

M t (D, X,) = F'S (. X ( - X,)'Fi 
1-1 

= F'jMi - 2 X'jF-Mr + X'fF'jMo. 

Summing for j then gives 

(3.51) *9ftj = M'oM 2 - 2M[M 1 + . 

If ,we define the second aggregate unit moment as 


. = M 2 _ „ M, M[ , M\ 

(3.52) Mo Mo Mo Mo 

= C7 2 + it' 2 + (A — A')\ 

where the <r and the A stand for the standard deviations and the arithmetic 
means of the respective distributions. Now we define the “mean square co¬ 
efficient of similarity” as the value of 


A a ;2 
4a a 

“ [(7 2 + (7 /2 + (A - A') 2 } 2 ' 

It is obvious that a minimum value of S 2 requires that A = A f as a necessary 
condition for the maximum degree of overlapping. Maximum similarity re¬ 
quires, in addition, a = o-', in which case 8 % — 1. 

For a measure of similarity which is independent of difference in position be¬ 
tween the two distributions, we define. 


where Va is the minimum value of c m% for all positions of the two distributions, 
without changing their form. This is obtained by merely taking 

(3.55) w = + <r' s . 


For a measure of overlapping we can follow Gini in contrasting the actual 
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value of c nh with its minimum Va > since the maximum of overlapping corresponds 
to the minimum value of c vh . We thus set 


(3.56) 



2 I ft 

O * t " v 

o* + o'* + (A - A f y 


a measure which we shall call the “density of overlapping”. Its maximum 
value is unity. 

It may be remarked that all the indices proposed in this paragraph are easier 
to calculate than those of paragraph 3.4. The individual terms are all functions 
of only one of the two distributions; yet the resulting indices are independent of 
the origin chosen, and therefore free from any criticism based on doubt as to the 
representativeness of the arithmetic mean, in cases of marked skewness. 


4. Positive and negative moments, and moments of transvariation. 

4.1. The aggregate sliding total moment of two frequency distributions D 
and D' may be expressed in the form 

(4.11) M,(D, X') = Fj £ (X, - XjYFt + F' t £ (X, - X,) r Ft 

*-1 i-J+1 

when both distributions have been artificially extended, if necessary, to cover 
the same total range, as previously described in section 3.4. We shall char¬ 
acterize the second term in the right member of (4.11) as the positive sliding 
moment, and the absolute value of the first term as the negative sliding moment. 
We shall denote these moments by + M r {I), X 3 ) and ~M r {D , X/). The complete 
moments obtained by summing these separate terms over the range of values of 
j we shall call the positive and negative aggregate complete moments. Thus 
the positive complete moment is 

(4.12) + m = E [f; e (x ( - xtfFi 1 

7-1 L *—7+1 J 

and the negative complete moment is 

(4.13) = g [f;- g (x ; - - x.y f.] . 

That one of these two partial moments which is obtained from differences X* — 
X'j having the opposite sense to that of the difference X — X' will be called the 
moment of transvariation of the two distributions and will be denoted by the 
symbol r 9K r . Here, as in section 3.4, X and X ' denote averages of any pre¬ 
viously selected type. For example, if the arithmetic means are the averages 
selected, and if A — A' is positive, then the negative aggregate moment is the 
moment of transvariation, and vice-versa. 

In the trivial case in which the two distributions are identical, the positive 
and negative complete moments are equal, and both reduce to merely one half 
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the aggregate complete moment (computed by the use of absolute values in the 
case of moments of odd order). 

The unit moment of transvariation will be defined as 


( 4 . 14 ) 


nir 


T m r 

T m> 


4.2. It is evident that the moments of transvariation can be considered as 
measures of overlapping. Any such moment equals zero when there is no over¬ 
lapping and becomes greatest when the two distributions coincide. Taking unity 
to represent the maximum and zero the minimum of overlapping, we may choose 
as a general measure of overlapping, 


(4.21) 


4 V _ 4 r 2H? 

wt^Tf m'r | Mlatfl 


It will be seen that this quantity always equals zero when there is no overlapping, 
and equals unity when there is complete overlapping: that is when the two dis¬ 
tributions are identical. 


5. Need for further developments. All of the measures above described 
were defined for the case of finite sets of magnitudes, expressed as frequency 
distributions D and D f . Now these sets of magnitudes may be thought of as 
samples drawn out of their corresponding universes. The consideration of these 
universes would lead to more general representations under the form of frequency 
functions, and the above measures would be expressed as definite integrals rather 
than summations. This draws attention to the need for tests of significance of 
the magnitude of all the above measures, especially those of overlapping, in 
order to allow for sampling fluctuation. Obviously, when the frequency func¬ 
tions are of the asymptotic type some amount of overlapping will always exist. 



ON A PROBLEM OF ESTIMATION OCCURRING IN PUBLIC OPINION 

POLLS 

( By Henry B. Mann 

Ohio State University 

To arrive at an estimate of the number of electoral votes that will be cast for 
a presidential candidate a poll is taken of \iN interviews in the ith state (i = 1, 
• • • , 48) where the X< are fixed constants > 0 such that 2X< = 1 and the re¬ 
spondent is asked for which candidate he intends to cast his vote. To estimate 
the number of electoral votes which candidate A will receive, the electoral votes 
of all the states in which the poll shows a majority for candidate A are added 
and their sum is used as an estimate for the number of electoral votes which 
candidate A will receive. In this paper certain properties of this estimate will 
be discussed. It will be shown that it is a biased but consistent estimate and 
an upper bound for the bias will be derived. Finally we shall derive that dis¬ 
tribution of interviews which minimizes the variance of our estimate. 

In all that follows we shall consider the poll as a random or stratified random 
sample and shall disregard the bias introduced by inaccurate answers. Our 
results however remain valid as long as the sampling variance is proportional 

t0 Vn' 

We shall use the following notation: 

7 r< = proportion of voters in the zth state who intend to vote for candidate A. 

€i =1 if n > i 
0 if 7T» < 


Wi — number of electoral votes of the ith state. 

Pi, €i = sample values of m and resp. 

We shall further exclude the case m — 

The number of electoral votes for candidate A is then given by 

ct = r. 


As an estimate of T we use the quantity 

(1) 123 s = G. 

Let pi be the probability that p> > £ and hence e» = 1. Let \iN = Ni be the 
number of interviews in the ith state. If Ni is not too small then pi is given by 


( 2 ) 


Pi 


i: 


i 


yj 2w<n 
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In this formula <r, = j ^/if the sample is an unstratified random 
sample and may be somewhat less if the sample is a stratified random sample. 1 
For our purposes it is sufficient to assume that <r» is proportional to ^7=. 

We then have F(e<) = p t and 
(3) E(G) *= EeiWi) — 

Hence O is a biased estimate of r. On the other hand* plim p* — and 

jit-mo 

hence plim e* = €< and therefore plim G = r. That is to say 0 is a con- 

N—co N—to 


sistent estimate of r. 

According to (3) the bias is given by 

( 4 ) b(n) « £&• Wi - E::i* Pi Wi = E;:r («< -«)«>.•. 


We have 

«» — Pi = ~ “7= f e~* xi dx if 7r*- < i 

« - JL. • * if t< > *• 

For a stratified as well as for an unstratified sample <r< is proportional to 
and we therefore put 



(5) 


i — a-, _ /r.V^^if IT,- < i 

<n \—■ yty/Ni if *-<>$’ 

Then we have in both cases 

(6) l«-*l 

We have for a > 0 

f e~ iz * dx < h(e~* al + + e _1( “ +2 * ) * + 

*a 

< e~ iat h{l + + e~ tah + • • • ) 


= e - '-' 


for every value h. 

h 

Since lim :- 

1 — 6 

(7) 


~ we have 
a 


f. 


~ah 




e~ ix * dx < -— for every a > 0. 
a 


1 The variance in public opinion polls is somewhat larger than the random sampling 
variance due to the fact that a cluster sample is used and not a random sample. For the 
same reason the estimate pt of n may be biased. 

* For the notation used here see: H. B. Mann and A. Wald, “On stochastic limit and 
order relationships”. Annals of Math. Slat., (1943), pp. 217-227. 
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T?rom (6) sad (7) we obtain 


< 8 ) 


I «< - Pt I 


< 


V2Wi7i' 


From (4) and (8) we have 


Formula (9) .is valid whenever t* ^ J and shows that B(N) converges rapidly 
to 0 for all values in ^ 

To obtain an approximate idea of the magnitude of the bias we may in (4) 
replace «< and p, by their sample values e* and n . The quantity 2<Ii 8 w« 
(e< — ri) can, however, not be regarded as an estimate of B(N). 

We now proceed to compute the standard error of G. We may consider the 
poll as 48 single experiments where the probability of success in the ith experi¬ 
ment is given by p* where 


4= r 

/2ir 


\/2 T JyiVJTi 


-ix* 


dx 


= J* 

\l - P* 


ITi < J 

if *>< > y 


Hence the variance of G is given by 

< 10 ) u = Pi (1 Pi) w *i • 

As an estimate of a 2 we can use the quantity S 2 obtained by replacing p, by 
its sample value. 

We shall consider that distribution of interviews as best which minimizes 

EM - r) 2 ]. 

We have 


EM - r) 2 ] - <T 2 + B\N) . 

We therefore consider the problem of minimizing <r 2 + B 2 (N) under the restric¬ 
tion sa* Ni - n. 

We have 


da 2 __ 
Mi " 


d<r a dpt 

dpi dNi 


= w \a- 2 Pi ) 




2B(W) 


dS(AT) 

aiv< 


—2wtB(N) 


dNi 


dpt 

dNi 



e" 1 **'* 


T«_ 

2-\/Ni 


if *7 < i 


1 

-s/2^ 




yj_ 

2 y/Ni 


if T,- > J. 
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Hence applying the method of Lagrange operators, we obtain 

(id = m w<lw<(1 ~ 2pi) ~ 2B{N)] = x> »- 1 • • • 4s- 

S;:! 8 Ni = w. 

The parameters and w, in equation (11) can be estimated from a previous 
poll. 8 It is not certain that (11) has always solutions. However if the quantity 
<r* + B 2 (N) has a minimum for a set of values Ni , • • • , N* with Ni ^ 0 (t = 1, 
• • • , 48) then (11) must have a solution 
One might be induced to try to estimate S pMi directly by using r» = 
1 f 00 

~ 7 = / e~ x%12 da; as an estimate of pi . It is easy to see that u is a con- 

sistent estimate of u . It will be shown however that this estimate is more 
biased than the estimate (1). 

Since <r, differs only very little from its sample estimate vfe may replace this 
sample estimate by <r<. We then have 

- l: (/," *) 

= 2^? I ” /f dx dp<. 

(* - p.) 2 + (p. - 7T.) 2 = + 2 (p, - 

Sfo) = 2^ Jf (£" dp ^ dx . 


Now 


Hence 


The second integral is equal to y/wa 2 - Hence 


~ x */2 


<V2 


dx. 


* If for any f were very close to $ then it would be of little use to poll the ith state. 
Hence, in this case formula (11) gives a small value for Ni. However, the *■< are never 
accurately known. The following procedure might be recommended for determining the 
best distribution of interviews: If for one particular i the sample value of n as estimated 
from a previous poll is too close to $ determine, using the Ni of the previous poll, that value 
of *{ for which the probability is & that is larger than J and substitute in (11) ir< 
for vi . In all other cases substitute the sample value. 

If several polls are taken it is advisable to use all of them but the last one to estimate 
as closely as possible the values of the , The sample of the last poll before the election 
should be distributed according to (11). 
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From (12) we see that E(u) < p< if > £ and E{n) > p< if < J. 

Thus in every case this estimate is more biased than the estimate (1). 

On the other hand, we shall now show that F[(e, — r,) 2 ] is always smaller than 
E[(€i — e*) 2 ]. Since = 1 if v< > \ and u = 0 if < i it is easy to verify that 
E[(*i — n) 2 ] has the same value for v* = a as for v* = 1 — a and the same is true 
for E[(u — e,) 2 ]. We may, therefore, without loss of generality assume that 
*■» < J. 

Thus we have to show that 


(13) E(r’) < E(e*) = Pi - f dx if ^ f 

We have 

Bfr! > - vk C • — , ’“' i <k 

. 

Now 

0(x, y, pi) = (x - p^ + (y - pi)* + (pi - in)* 

x + 


(p< ~ X -±JL±* J + 1 -(x + y- 2n)' + \{x- y)\ 


Putting 




Pi = 


\/6 

, J_ (* - y) i - 2 t< 

y " V2 *i ’ Vto “ °’ 


(x + y — 2Q 


<Ti 


we obtain 


B< * - (vfe 1 C C (Ck *) *' <,p ' 


. > r.-~ 

Now for it,- = J we have a «■ 0, and for v* < J we have a > 0. For a = Owe 
obviously have E(r 2 < E(e 2 ), Further lim E(r*) = lim E(e 2 ) = 0 hence (13) 

a—*oo a-*oc 

is proved if we can show that 

F(a) = *(rj) - £(«?) - ~ f «** e -»«* dydx - -7= f e^'dx 

2r it «V»c«-*) v 2 t J ^/j a 
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is a monotonically increasing function of a. Differentiating F(a) with respect 
to a we obtain 


dF(a) __ _ \/3 f* c -i(4x*-««a:+8o*) V3 ^-<8/4)a* 
ir J 0 2%/ir 


da 


(14) 


“~V / 3 e ~(3/4)c 


_ -V3 -(»/4)a> / ^-^3 


7' 

J a 

f 

* a 


e 


4(x—(8/4)a)* 


V§ ^-(8/4)0* 


d*+ 

2\/ 7T 

VO a ~(8/4)o» 


dz + e 

2 >/V 


Hence for a > 0 we have 


f > ZV| e -i“ s + V3 e-f* > o. 

2\/27r 2 \Ztt 


Hence we have proved 


(15) 


£[(«< - r,) s ] = i. f e~* l! |0|> e - ** dy dx < £[(«,• - e,)’], 


a = 


V3(|o|- 

1 - 2 Ti 


V 6 


CT* 


Since 


E[(« - Cj) ! ] - E[( u - r.) 2 ] 
is largest when *•< = £ we also have 

*<« - a i«- * i - [I - h C C e "” * dt ] 


or 

(16) [a- Pi \> EKn - r,) 2 ] > ± jf° e-*’ e~^ dy dx - | * - «|. 

Because of (15), r* although more biased may in many cases be preferable 
to €i as an estimate of . 



NOTES 

This section is devoted to brief research and expository articles, notes on method - 
ology and other short items. 


A COMBINATORIAL FORMULA AND ITS APPLICATION TO THE 
THEORY OF PROBABILITY OF ARBITRARY EVENTS 1 

By Kai-Lai Chung and Lietz C. Hsu 
National Southwest Associated University , Kunming , China 

An important principle, known as a proposition in formal logic or the method 
of cross-classification can be stated as follows. 1 

Let F and f be any two functions of combinations out of (v) == (1, 2, • • • , n). 
Then the two formulas 

(1.1) F((a» = 2 /((a) + m 

(0) « (O-(a) 

(2.1) /((«)) = 2 (—1 )‘F((«) + OS)) 

(0) € (r) — (a) 

are equivalent. 

As an immediate application to the theory of probability of arbitrary events, 
we have the set of inversion formulas 2 


(3.1) 

?>((“)) = 

£ 

p[(“) + 0®)] 



(0) « <*)-(«> 


(4.1) . 

?[(“)] = 

£ 

(0) « <*)-<«) 

(-l) 6 p((«) + 0*)) 


where p((a)) is the probability of the occurrence of at least E ai , E at , • • *, E aa 
out of n arbitrary events E x , E %, • • • , E n and p[(a)] is the probability of the 
occurrence of Ea x , Ea t , • • • , Ea a and no others among the n events, (ai, a% , 
• • •, a«) denoting a combination of the integers (1, 2, • • • , n). They can be 
made to play a central r61e in the theory, since they supply a method for con¬ 
verting the fundamental systems of probabilities, p[(a)] and p((a)), one into the 
other. 

We may further generalize (1.1) and (2.1) by considering combinations with 
repetitions. Let such a combination be written as 

(a) = (a r ) = (al l a£* • • • a r a a ) 

1 For the notations and definitions see K. L. Chung, “On fundamental systems of prob¬ 
abilities of a finite number of events,’* Annals of Math. Stat ., Vol. 14 (1943), pp. 123-133. 

* Cf. FbAchet, Les probability associles d un systems d'bvknements compatibles et depen¬ 
dants, Hermann, Paris (1939), formulas (55) and (58). 
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where r» (r* > 1) denotes the number of repetitions of the number on, % * 
1, 2, • • • , a. Correspondingly we write 

(a)' = (a t a 2 •••««) 

and call it the reduced combination corresponding to (a). 

If there are n distinct elements (1, 2, • • •, n) in question, we may write every 
combination in the form 

(l fl 2 rs • • • ri n ) 

where each r< is zero or a positive integer. We say that • • • n* n ) belongs 

to (l ri 2 f2 • • • rt n ) and write 

(1 #1 2** • • - n* n ) e (l ri 2 rz • • • n r “) 


if and only if for each i, i = 1, 2, • * * , n, we have s { < r*. We write 
(l ri 2 r * • • • n r ") + (l* l 2** • • • n 8 *) = (l r ^2 r *+ 82 • • • n r " + '"); 
and if (1 M 2** • • • n'") € (l ri 2 ra • • • n r "), 

(l ri 2 r * • • • n n ) - (1 M 2* 2 • • • n tp ) = * • • n rrt ~ 8n ). 


We define a generalized Mobius function m((«)) for combinations (with or with¬ 
out repetitions) as follows 


m(M) = 


(-1) - if (a) = (a)' 
0 if (o) 3^ ( a )'. 


This function has the property 


E M«0) 

(/S) « («) 


1 if (a) = (0) 

0 if (a) 9* (0). 


For we have 


E m((/?)) = 

(fl) « (a) 


E 


(0) e (a)' 


(-D 6 = E (-D* 

5-0 



_ 1 if a' = 0 _ 1 if (a) = (0) 

0 if a' 5 ^ 0 ” 0 if (a) ( 0 ). 

Now we state and prove the following general theorem. 

Theorem. Let (a),- = (ali l a< 2 2 * * • «!#) and (v)» = (1 X “2 X<2 • • • n x<w< ) 

where A,y and n* are finite and 1 < r,-y < A,*/, 1 < a* < n,-/or * = 1, 2, • • • , m 

and j — 1, 2, • • • , n<. Then for any two functions of the m combinations (with 
repetitions ), (a)i, (a) 2 , • • • , (a) m oid of (v) x , M 2 , • • • , Mm , the two sets of 
formulas: 


, M 2 , • ■ • Mm) 

X) /(Ml + Ml > («)* + ($2 , ‘ (a)m + (0)m) 

( 0 )<« (*) <-(«>< 


(1) 
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and 


/((«),, (a) it •••, («) m ) 


( 2 ) 


2 ft m((P).) 1 F((a), + (8)r, (a), + ( 0 ), , • • •, (a)„ + (fi)J 
o)<«(*•)*—(<»)< L*-i J 


ore equivalent . 

Proof. To deduce (2) from (1) 


2 

<0) % « (Or(o)i 


[ft *(<P)«>] F((«)t + (P)i, • • •, («)- + (P).) 

= 2 n m(o).)~| 2 

«>.-«(»)<-(■»>< L«'-i J (»<«(••>(-(<»),—«»),• 


•/((“)l + (P)l + (y)i , • • •> (“)* + (P)m + ( 7 )*) 

2 /((«) 1 + ($)l , • • •, (<*)m + (S)m) 

(3)» « (>')»— («)< 


■ 2 n - wo. 

(7)< « («)i t-l 


Evidently we have 


m m 


2 n/.«•),- 

i 

2 

II 

a 

M 

#»((*)< - M<)\ 


<y)i « (*)< »- 1 

*’-i ( (y)i « (3)* 



m 

= IT 

t-i 1 

1 (*>£(«/ ((7)<) ! = 0 

if (5) t - = (0) for i *= 1, • 
otherwise 

• *, m 


by the property of the /x-function. Hence the preceding sum reduces to 
f((a)i , • • • , (a) m ) in accord with (2). 

(1) is deduced from (2) in a similar way. 

Although the general case is not without importance in the treatment of 
several sets of events, 8 we shall for the sake of convenience restrict ourselves to 
the special case m = 1. 

In order to apply these formulas we must first introduce combinations with 
repetitions into the theory of arbitrary events. This can be done in various 
ways. Firstly, we may consider the number of occurrences of each event in a 
given time-interval or in a series of trials not necessarily independent. Secondly, 
we may regard each event as possessing various degrees of intensity. If the 
event Ei occurs r* times in a given time-interval or occurs with r, degrees of 
intensity, we write it as EY. Hereafter we shall make use of the first interpreta- 

8 Cf. Fk6chet, Loc. Cit. pp. 50-52; also, K. L. Chung, “Generalization of Poincares 
formula in the theory of probability,” Annals of Math. Stat., Vol. 14 (1943). We may note 
that our general theorem may be used to give another proof of the generalized Poincare’s 
formula for several Bets of events. 
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tion and we shall assume that the maximum number of occurrences of each event 
is finite: 


0 £ u £ X t , i * 1, • • • , n. 

We define 

p[E[ l • • • E r n\ = p[(v)] = The probability that Ei occurs exactly r* 
times in the given time-interval. 

p(E [ 1 • • • E r f) = p((/)) = The probability that Ei occurs at least u 
times in the given time-interval. 

These quantities play the same r61e as the p[(a)]’s and p((a))’a in the ordinary 
theory. Evidently the probability of every complex event in question can be 
expressed as the sum of certain p[(/)]’s. To prove that the p((/))’s also form 
a fundamental system of quantities we have only to express p[(/)]'s in terms of 
the p((v r ))*B. This is given immediately by an application of the general 
theorem with m = 1. For we have in an obvious way 

P(E[' ■ ■ ■ E r n ') = D p[E[' ■ ■ ■ E‘ n '\ 


or 


(3) 


p((0)- £ pioo + WJ- pl(f*% 


Hence we obtain the inversion 


(4) 


pKOl - £ m((*'))p(0O + O'')). 

(v«) < <7X)-(*') 


Let (a') denote a running combination without repetitions. Then since Mv 9 )) = 
0 unless (/) is a (/), 


(40 ?[(/)] = £ *»((«'))?((/) + («')) 

(a') « (* x “ r ) 


£ (-DXW + W)’ 

(a') « (K X *-r) 


The set of formulas (3) and (4) generalize (3.1) and (4.1). 

Corresponding to the Pt«j(M) for the ordinary events we define for a + b + 
••• = n and r, 5, • • • all distinct: 

C®i l ••• E X n) — The probability that among n events E\, E* , 
• • * , E n exactly a events occur r times, exactly b events occur s times and so on. 
By (4) we easily obtain 


P[«] M&]«,“.((i' X )) 

<5) -s . s 


3 (»*) i (.A)_«a)'+(lS)*+...) 


m(CO)p((»' x ) + («)' + (0)‘ + •••) 


where (a) r = (E r ai E r „ t ), (fi)‘ — (f?J, • • • E‘ fi ), • ■ • and the first summation 
is a symmetric sum which extends to all n!/a!bl • • • different combinations 
(«i • • • ««)» (ft • • • /S»), • • • out of M = (1, 2 • • • n). 

The equality (5) is obviously a generalization of Poincare's formula. 
Similarly for the probabilities in the definition of which the word “exactly” 
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is sometimes substituted for the words 4 ‘at least.” Of course we can express 
all of them in terms of the p[(/)]’s or of the p((/))’s. However elegant formulas 
such as in the ordinary theory seem to be lacking. 

Finally, we may also consider conditions of existence for the pKOl's and the 
P'((v r ))*8. For the former system the conditions are that they be all non-negative 
and that their sum be 1. For the latter system, the conditions are given by 
(4'), viz. for every (/) e (v x ), 

£ ^((a'))p((/) + (a)) Z 0. 

(«') t (p*-r } 

These conditions are necessary and sufficient since (3) and (4) are equivalent. 


ON THE MECHANICS OF CLASSIFICATION 

By Carl F. Kossack 
University of Oregon 

1. Introduction. Wald 1 has recently determined the distribution of the 
statistic U to be used in the classification of an observation, z* (i = 1, 2, • • • , p), 
as coming from one of two populations. He also determined the critical region 
which is most powerful for such a classification. It is the purpose of this paper 
to show how such a classification statistic under the assumption of large sampling 
can be applied in an actual problem and to present a systematic approach to the 
necessary computations. 

The data used in this demonstration are those which were obtained from the 
A.S.T.P. pre-engineering trainees assigned to the University of Oregon. The 
problem considered is that of classifying a trainee as to whether he will do un¬ 
satisfactory or satisfactory work 2 in the first term mathematics course (Inter¬ 
mediate Algebra). The variables used in the classification are: (1) A Mathe¬ 
matics Placement Test Score. This is the score obtained by the trainee on a 
fifty-minute elementary mathematics test (including elementary algebra). 
The test was given to each trainee on the day that he arrived on the campus. 
(2) A High School Mathematics Score. A trainee’s high school mathematics 
record was made into a score by giving 1 point to students who had had no high 
school algebra, 2 points to students with an F in first-year, high-school algbra 
and no second-year algebra, 3 points for a D, • • • ,10 points for an average grade 
of A in first- and Becond-year algebra. (3) The Army General Classification 
Test Score. An individual needed a score of 115 or better in order to be assigned 
to the A.S.T.P. These data were obtained for 305 trainees along with the. actual 

1 Abraham Wald, “On a statistical problem arising in the classification of an individual 
into one of two groups,” Annals of Math. Stat., Vol. 15, (1944), No. 2. 

* Unsatisfactory work was defined as a grade of F or D in the course (failure or the lowest 
passing grade). 
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grade made by them in the algebra course, 
were not included in the study. 

Trainees who had had college work 

2. Steps in the Computation of U and the Critical Region. Let 

n be the population of individuals who do unsatisfactory work in their first- 

term mathematics course. 


t 2 be the population of individuals who do satisfactory work. 

Nj and N% = respectively the number of observed individuals in **i and ir 2 . 
X\ a and 2 /ia = respectively the Mathematics Placement Test Score for the 

ath individual observed in pi and 7 t 2 , 


x Za and y 2a = respectively the High School Mathematics Score. 

Xia and yza = respectively the Army General Classification Test Score. 

Step 1 . Computation of Summations 


Ni — 96 

2Vj = 209 

I>i. = 3570 

= 11450 

2 x 2 „ = 547 

a 2 y io = 1567 

2X) a = 11745 

2 y, a = 26684 

2xL = 145476 

2 y? a = 672452 

2 xL = 3509 

2yla = 12577 

SxL = 1439559 

2yL = 3421996 

SXiaXia = 21012 

ZyiaVia = 88774 

SxiaXza = 436964 

2yiaya« = 1469302 

2x 2a Xia = 66731 

2 y 2 »ya„ = 200150 

S(xi« - xO 2 = 12716.625 

2 (y la - y x ) ! = 45167.311 

2 (x 2 „ - xj ) 2 = 392.240 

S(y 2a — y 2 ) 2 = 828.249 

2(x , 0 - x ,) 2 = 2631.656 

2(y. a - fa? = 15125.876 

2(xio — xi)(xj„ — x 2 ) = 670.438 2 (yi«, - jji)(y?a — fa) — 2926.392 

2 (xi. - x,)(x So - x a ) = 196.812 2 (y la - yi)(ys» - fa) = 7427.359 

2 (x * 0 - St)(x t „ - x,) = -191.031 

2(y 2a - y 2 )(ya» - ft) = 83.837 

Step 2. Computation of Statistics. 


* « 37.188 

$1 = 54.785 

St = 5.6979 

fa = 7.4976 

S t = 122.3438 

fa = 127.6746 

2 (x ia - £,-)(x, a - xf) + 2 (y<« - y ( )(y*i - y,) 

+ JV, - 2 

Su » 191.04 

8b = 11.871 

s# = 4.0280 

81 a = 25462 

8u = 58.606 

<823 — .35378 
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57 


Step 3. Computation of Inverse Matrix \ s' ! \ 

191.04 11.871 25.162 

| Si; | = 11.871 4.0280 -.35378 = 34053 

25.162 -.35378 58.606 

« n = .0069286 s 12 = -.020692 

s 22 = .31019 s 13 = -.0030996 

s 33 = .018459 s 23 = .010756 

Step 4. Computation of the Classification Equation . 

u = [s n (ft - x,) + s 12 (ft - x 2 ) + Aft - ft)].*, 

+ [Aft — xi) + s 22 (ft - ft) + Aft — ft)]-z 2 

+ [« 31 (ft — ft) + S 32 (ft — ft) + S 3 *(ft — ft)] -Zj 

where Zi plays the same role for individuals to be classified as Xi a and y% a do for 
observed individuals. 

U = .068160 Z! + .25147 z 2 + .063215 z 3 
Step 5. Computation of the Critical Region {assuming W x = W 2 ) 
a\ = .068160 + -25147 ^ + .063215 x z « 11.702 

« 2 = .068160 y v + .25147 y 2 + .063215 & * 13.691 
§(*i + « 2 ) = 12.696 

Therefore, 

For U < 12.696 classify the individual as coming from population. 

For U > 12.696 classify the individual as coming from tt 2 population. 

Step 6. Computation of the Efficiency of Classification . 

a = Aft - ft)(ft - ft) + Aft - ft)(ft - ft) + s n (yi - ft)(ft - ft) 

+ Aft - ft) (ft ~ ft) + Aft - ft) (ft - ft) + s 23 (ft - ft)(ft - ft) 

+ Aft - ftMft - ft) + Aft - ftXft - *2) + Aft - ftXft - ft) 

= 1.5764. 



where Pi is the probability of making an error of Type I, that is, of classifying 
mi individual as one who will do satisfactory work when he actually does un¬ 
satisfactory work; and 1 — P 2 is the probability of making an error of Type II, 
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that is, of classifying a student as one who will do unsatisfactory work when he 
actually does satisfactory work. 

3. Conclusions. In using the above classification equation to classify the 
305 trainees used in this study, 21 errors of Type I were made or 22.9 percent, 
while 50 errors of Type II were made or 23.9 percent. These percentages seem 
reasonably close to the expected 20.6 percent. 


NOTE ON AN IDENTITY IN THE INCOMPLETE BETA FUNCTION 


By T. A. Bancroft 
Iowa State College 


Since the incomplete beta function has proved of some importance in statistics, 
it would appear that any additional information concerning its properties might 
at some time prove useful. In a paper by the author, [1], two identities in the 
incomplete beta function were incidentally obtained. They are as follows: 

(1) (p + q)Iz(p, q ) = plz(p + 1, q) + qhip, q + 1) 

and 

(2) (p + q + l) l2j /*(p, q) = (p + 1 ) l2] I x (p + 2, q) + 2 pql x (p + 1, q + 1) 

+ (p + 1 ) W I,(p, q + 2), 


where the incomplete beta function 7*(p, q) 


Bs(p, q) 

B(p, q) 


, etc., and (p + 1) [2J , 


etc. refer to the standard factorial notation. 

Written in the above form these two identities suggest a possible general 
identity to which they belong as special cases. The third special case suggested is: 


(p + q + 2) l8l /*(p, q) = (p + 2) t3l I*(p + 3, q) 


(3) + 3 (p + 1 ) i2] qlx(p + 2, q + 1) + 3 p(q + l) l2I I*(p + 1, q + 2) 

+ (? + 2) f31 /*(p, q + 3). 


The general formula suggested is 

(4) (p + q + n - l)' Bl 7, (p, q) = £ ^ (p + n - r - l)‘"~ r] 

• (q + r - l) w I z (p + n — r, q + r). 


To prove the general formula we write (4) as 

(5) (p + q + n - l) tBl I x (p, q) = (”) ( p + n ~ r ~ 1 ^ B_rl 

_L . _ B*(p + n - r, q + r) 
(9 + " 1} B(p + «-r, g + r) * 
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( 6 ) 


By expanding and simplifying it is easy to show that 

(p+n-r- l) tg ^(g+r- l) w _ (p + q + n - l) 1 * 1 


B (p + n-r,q + r) 

Using (6) the right hand side of (5) reduces to 


B(p,g) 


® (?+ U^r r 5C0 B - (p+ ”- f ’ g+r) - 

The s umm ed function in (7) reduces to 


( 8 ) 


x p_1 (1 - xY 1 [x + (1 - i)] B dx - B, (p, q), 


which proves the identity. 

Although the general identity is quite simple to prove, it does not seem to 
have appeared in the literature. 


REFERENCE 

(1 ] Bancroft, T. A. “On biases in estimation due to the use of preliminary tests of sig¬ 
nificance,” Annals of Math. Stat., Vol. 15 (1944), No. 2. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Archie Blake is now employed as a ballistician with the Ballistic Research 
Laboratory at Aberdeen Proving Ground. 

Robert V. Bonnar is now employed as Associate Technologist at the Mare 
Island Navy Yard. 

Professor W. G. Cochran has returned to his regular duties at Iowa State 
College. 

Mrs. Bianca Cody (Bianca Rivoli) is now Statistician for the James O. Peck 
Research Company, 12 East 41st Street, New York City. 

Associate Professor William Feller of Brown University has been appointed 
Professor of Mathematics at Cornell University. 

Professor John Kenney of the University of Wisconsin is now located at the 
Milwaukee branch of the University. 

Myra Levine is now Assistant Mathematical Statistician with the Statistical 
Research Group at Columbia University. 

Mrs. Harold Michaelis (Ruth E. Jolliffe) is 5th Naval District Statistician 
at the Naval Operating base in Norfolk, Va. 

Emma Spaney is Statistician for the Committee on Measurement of the 
National League of Nursing Education. 

Professor J. A. Shohat of the University of Pennsylvania died October 8, 1944. 

Mr. Redford T. Webster of the Western Electric Company died July 31,1944. 


New Members 

The following persons have been elected to membership in the Institute: 

Boddle, John B., Jr. Chief, Program Section, Budget Division, Washington, D. C. 2628 
Tunlaw Road, N.W. 

Bruner, Nancy M.A. (Iowa) Statistician, Western Auto Supply Co., Kansas City, Mo. 
7611 Main St. 

Christopher, Edward £. B.S. (Mass. Inst. Tech.) Statistician, Signal Corps. 6704 North 
26th St., Arlington, Va. 

Cowden, Dudley J. Ph.D. (Columbia) Prof, of Economics, Univ. of North Carolina. 
Box 615, Chapel Hill, North Carolina. 

Cynamon, Manuel M.S. (City Coll., N. Y.) Personnel Tech., Personnel lies. Sec., Adj. 

General’s Office, War Dept. 10 Ave. P, Brooklyn 4, N. Y. 

Evensen, Edward J. On military leave from Metropolitan Life Ins. Co. (Actuarial Sec.) 
Sv. Co., 1st Sp. Sv. Force. 

Green, Earl L. Ph.D. (Biwn) 1st Lieut., A.C., Chief, Dept, of Statistics. AAF School 
of Aviation Medicine, Randolph Field, Texas. 

Groves, William Brewster B.S. (Antioch) Economist, Off. of Price Administration. 
520 Decatur St., N.W., Washington, D. C. 
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Homseth, Richard Alien M.A. (Wisconsin) Res. Assistant in Sociology, Univ. of Wiscon¬ 
sin. 807 N. Randall , Madison 5 , Wis. 

Kinsler, David M. M.A. (Chicago) Chief, Analytical Section, Anns A Ammunition Divi- 
sion, Aberdeen Proving Ground, Maryland. 

Kopp, Paul J. M.A. (Duke) Major, Chemical Warfare Service, U. S. A. ISOS North 
Adams St., Arlington, Fa. 

Massey, Frank Jones, Jr. M.A. (California) Associate, Dept, of Math., Univ. of Cali¬ 
fornia, Berkeley, Calif. 1864 Union St., San Francisco 9, Calif. 

Orcutt, Guy H. Ph.D. (Michigan) Instr. Economics Dept., Mass. Inst, of Tech., Cam¬ 
bridge, Mass. 

Rakesky, Sophie M.S. (Michigan) Statistician, W. K, Kellogg Foundation, Battle Creek, 
Mich. 

Roberts, Jean M.S. (Minnesota^ Statistician, Child Welfare Res. Analyst. 989 Good¬ 
rich Ave ., St. Paul S, Minn. 

Schletroma, William B.S.S. (Coll, of City of N. Y.) Research Assistant. SIS East 116th 
St., New York, N. Y. 

Schlorek, Mary A. A.B. (Adelphi) Research Statistician, National Broadcasting Co., 
30 Rockefeller Plaza, New York, N. Y. 

deSousa, Alvaro Pedro B.E. (Liverpool) Vice-Governor, Banco de Portugal. Monserrate , 
Rua Infante de Sagres, Estoril , Portugal. 

Steele, Floyd George M.S. (Calif. Inst, of Tech.) Stat. Analyst, Douglas Aircraft. 18168 
Roosevelt Highway , Pacific Palisades, Calif. 

Thom, Herbert C. S. 6130 18th Rd., N., Arlington, Va. 

Report of the Fifth Pittsburgh Chapter Meeting 

The fifth meeting of the Pittsburgh Chapter of the Institute of Mathematical 
Statistics was held at Engineering Hall, Carnegie Institute of Technology on 
Saturday, November 25, 1944. The meeting was held as a joint session with the 
Pittsburgh Quality Control Society. Thirty-one persons attended the meeting, 
including the following six members of the Institute: 

George Eldredge, H. J. Hand, C. R. Mummery, E. G. Olds, E. M. Schrock, J. V. Sturte- 
vant. 

The following papers were presented, with Mr. J. Y. Sturtevant, of the Car¬ 
negie Illinois Steel Corporation, acting as chairman: 

1. Modified Application of Control Chart to the Use of Gauges on Machine Tool Work. 

Dr. E. G. Olds, War Production Board, Washington, D. C. 

2. Application of Control Charts to Infrequent Inspection of Machine Operations. 

W. D. Angst, Thompson Aircraft Products Company, Cleveland, Ohio. 

3. Application of Control Chart Techniques to Checking Reproducibility of Chemical 
Analysis. 

H. A. StobbB, Wheeling Steel Corporation, Steubenville, Ohio. 

4. Statistical Principles of Experimental Design as Applied to Tests Conducted in Manu¬ 
facturing Operations. 

Dr. B. Epstein, Westinghouse Electric & Manufacturing Co., East Pittsburgh, Pa. 

H. J. Hand, 

Secretary-Treasurer , Pittsburgh Chapter 
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Educational Meetings of the Pittsburgh Chapter 

The first of a series of educational meetings on methods of statistical computa¬ 
tions given by the Pittsburgh Chapter was held on Saturday afternoon, January 
20, 1945. Thirty-three persons attended the meeting, including the following 
three members of the Institute: 

Thomas A. Elkins, H. J. Hand, J. V. Sturtevant. 

The following program was presented: 

1. Potential Field for Industrial Applications of Statistical Method. 

H. J. Hand, National Tube Company, Pittsburgh, i>a. 

2. Computations for Analysis of Variance and Experimental Design. 

Ben Epstein, Westinghouse Electric & Manufacturing Company, East Pitts¬ 
burgh, Pa. 

It is planned to hold these meetings bi-weekly, on Saturday afternoons for an 
indefinite period in the future. Topics to be considered in the series will include: 

1. Analysis of variance and covariance. 

2. Design of experiments. 

3. Tests of significance. 

4. Probability and probability distributions. 

5. Correlation and regression analysis, including the orthogonal coordinate method. 

6. Tests of increased severity. 

7. Sampling theory, including stratification. 

8. Acceptance-rejection mathematics, Dodge sampling inspection tables. 

9. Shewhart control chart techniques. 

10. Analysis of runs. 

11. Cycle analysis. 

12. Factor analysis. 


H. J. Hand, 

Secretary-Treasurer, Pittsburgh Chapter 



ANNUAL REPORT OF THE PRESIDENT OF THE INSTITU TE 

Continuing the established tradition, the annual summer meeting was held at 
Wellesley, Massachusetts, August 12-13, 1944 in conjunction with the Summer 
Meetings of the American Mathematical Society and the Mathematical Associa¬ 
tion of America. A regional meeting was held in Washington, May 6-7, in 
conjunction with the meeting of the Washington Chapter of the American 
Statistical Association. The programs were arranged by the Program Com¬ 
mittee: W. Feller, Chairman, W. G. Madow, and A. Wald. 

Even though, under present war conditions, research in the field of probability 
and statistics is very much curtailed, enough papers in mathematical statistics 
of satisfactory quality have been proposed for publication in the Annals in 1944 
to keep the total volume of material at approximately five hundred pages or the 
level of the last few years. However, the outlook for a sufficient number of 
satisfactory papers to maintain the usual volume of publication during 1945 does 
not look quite so favorable. 

Looking into the future, the Institute must continue to furnish, through the 
Annals , a medium for the publication of all important results of original research 
in the field of mathematical statistics as they become available. To do otherwise 
would be suicide. At the same time we must take account of the growing need 
for comprehensive surveys of statistical theory on the part of other scientists, 
including not only social scientists but also physicists, chemists, biologists, and 
research engineers, whose interest in the contributions of mathematical statistics 
has been greatly stimulated during the war. Only the mathematical statiscian 
of broad competence can provide adequate critical surveys of this character. 
Perhaps some of this need can be met through survey articles published in the 
Annals , although it is not an easy matter to get capable men to do such work. 
Perhaps the time is not far off when the Institute must stimulate the preparation 
of such material by instituting an annual series of Colloquium Lectures patterned 
somewhat after those of the Mathematical Society, which could be published 
separately. 

This is but one of many problems that the Institute faces in its post-war 
development. Not only must it assume the responsibility of stimulating and 
encouraging research and of publishing the results; it must also consider the 
problem of training the research statistician of tomorrow as well as those who 
are to apply mathematical statistics in the many fields of science. It also must 
assume some responsibility for keeping in contact with other scientists in order 
that the mathematical statistician may become acquainted with the unsolved 
statistical problems of the scientist. There are also many problems of a pro¬ 
fessional character that face the mathematical statistician in the future if he is 
to succeed in developing the profession of mathematical statistics to the level 
attained by some of the older scientific professions. 

With the realization of the need for a concerted attack on some of these 
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problems, the Board of Directors at its meeting in May set up two committees, 
one on Training and Placement of Statisticians under Harold Hotelling and the 
other on Post-War Development of the Institute under W. G. Cochran. In¬ 
terim reports received by the Board from both committees indicate that consid¬ 
erable progress has been made to date. They also indicate, however, that much 
more work remains to be done. 

At the same meeting of the Board, a Budget and Finance Committee was set 
up, consisting of P. S. Dwyer, Chairman, C. H. Fischer, A. C. Olshen, and C. F. 
Roos, to prepare a report on the policy that should be followed by the Institute 
in respect to such items as investment of funds, advertising, preparation of an 
annual statement, and the like. Some of the work of this committee has already 
borne fruit, as, for example, in providing the actuarial basis for life membership 
adopted at the Wellesley meeting and in establishing certain principles to be 
used in conducting the business of the Institute. 

A report of the Committee on Membership, W. G. Cochran, Chairman, P. S. 
Dwyer, and T. Koopmans, appears elsewhere in this issue of the Annals . Upon 
recommendation of this committee, the Board of Directors elected nine new 
fellows: Walter Bartky, C. I. Bliss, Gertrude M, Cox, P. A. Horst, M. G. Ken¬ 
dall, H. B. Mann, E. S. Pearson, Henry Schefite, and W. A. Wallis. 

The nominating committee for the year consisted of John Curtiss, Chairman, 
E. G. Olds, and F. F. Stephan. G. W. Snedecor served the Institute again as its 
representative on the Council of the A.A.A.S. 

The annual election of the Institute just concluded by mail ballot resulted 
in the election of the following officers for 1945: W. E. Deming, President; W. G. 
Cochran, and J. L. Doob, Vice-Presidents. 

Walter A. Shewhart 
President , 1944 

February 10, 1945 



ANNUAL REPORT OF THE SECRETARY-TREASURER 
OF THE INSTITUTE 

Accounts of the 1944 meetings of the Institute—the Wellesley meeting, the 
Washington regional meeting, and the Pittsburgh chapter meetings—have ap¬ 
peared in appropriate issues of the Annals. 

At the Wellesley meeting a number of amendments to the Constitution and 
By-Laws were passed. These were published in the September, 1944, issue of 
the Annals. (The amended Constitution and By-Laws appear elsewhere in this 
issue.) 

Due to a large extent to the cooperation of the membership in sending in nom¬ 
inations, the Institute enjoyed a large increase in membership during the year. 
There were some resignations and it was necessary to suspend fifteen persons at 
the end of 1944 because of failure to pay dues. It is apparent that, in some of 
these cases at least, our mail is not being received. Undoubtedly some of these 
memberships will be restored when contact is again established. As of January 
1, 1945, there were 606 members, a net gain of approximately one hundred 
members. 

During the year the Institute received gifts from Professor Harry Carver in 
the form of exchanges for early issues of the Annals , reprints of early articles, etc. 

The Secretary-Treasurer wishes to acknowledge the continued assistance of 
Professor Lloyd Knowler in looking after the back issues of the Annals which are 
stored at Iowa City. 

The following financial statement covers the period from December 22, 1943 
to December 31, 1944 (the books and records of the Treasurer have been audited 
by Professor Thomas A. Bickerstaff and were found to be in agreement with the 
statement as submitted): 

FINANCIAL STATEMENT 
December 22, 1943, to December 31, 1944 
Receipts 


Balance on Hand, December 22, 1943 $3,715.05 

Dues 

1944 and before. $2,995.31 

1945 and 1946. 1,127.00 

Life. 330.00 

- 4,452.31 

Subscriptions 

1944 and before. $1,301.94 

1945 and 1946. 883.94 

2,185.88 

Sale op Back Numbers . 1,385.02 

Miscellaneous . 6.15 

Total Receipts. $11,744.41 
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Expenditure 

Annals—Current 

Office of Editor. $273.77 

Waverly Press. 3,448.61 


3,722.28 

Annals—Back Numbers 

Purchase from H. C. Carver. $149.40 

Iowa City Office. 96.26 


246.66 


$377.00 

68.02 

465.94 

66.79 


966.75 

Miscellaneous . 29.07 

Balance on Hand, December 31, 1944 . 6,790.66 


$11,744.41 

No unpaid bills were in the hands of the Treasurer as of December 31, 1944, 
and aside from an additional $100.00 which the Board has designated for Annals 
expense for 1944, there were no large bills outstanding. 

Accounts receivable as of December 31, 1944, amounted to $303.73. Many 
of these accounts are current accounts while some of the older ones are accounts 
with firms in India, which probably will be collected eventually. 

The American Library Association continued with its purchase of thirty sets 
of Volume XV of the Annals (for post war distribution) and the Universal Trad¬ 
ing Corporation (representing the Chinese Government) purchased twenty 
sets of Volumes 11-17 inclusive. These orders contributed in no small way to 
the total 1944 income of $8,029.36. 

The 1944 balance $6,790.65 (consisting of bank balance of $3,790.65 and 
$3,000.00 in government bonds) is $3,075.60 higher than it was on December 21, 
1943. This increase is due in part to 1944 business and in part to the fact that 
unusually large payments toward future business, such as the $330.00 in life 
payments and the $1,127.00 in 1945 and 1946 dues, have been made. 

To summarize the situation briefly, the Institute’s 1944 activity has resulted 
in a gain of approximately $1,500.00 and we are about this much in advance 
of our usual position with reference to the payments of following years. 

Paul S. Dwyer 
Secretary-Treasurer. 


Office of Secretary-Treasurer 

Printing, mimeographing, programs, etc. (including stamped 

envelopes). 

Postage and supplies. 

Clerical help. 

Moving office from Pittsburgh. 


December 31, 1944 













REPORT OF THE MEMBERSHIP COMMITTEE OF THE INSTITUTE 

Since the duties of this Committee are not defined in detail in the Constitution, 
the Board of Directors asked the Committee to prepare a statement describing 
the appropriate composition and function of the Committee on Membership. 
This work resulted in the preparation of amendments to the Constitution and 
By-laws. These amendments were passed at the business meeting at Wellesley 
College on August 13, 1944, and are printed in full in the September, 1944, issue 
of the AnnaU (p. 340). 

In brief, the duties of the Committee are specified as follows in these amend¬ 
ments: 

(a) The Committee holds the power of election to the grades of Member and 
Junior Member and makes recommendations to the Board of Directors with 
reference to placing members in the other grades of membership. 

(b) It is the duty of the Committee to prepare and make available through 
the Secretary-Treasurer an announcement of the qualifications necessary for 
the different grades of membership and to review these qualifications periodically. 

(c) The Committee considers plans for increasing the number of applicants 
for membership. 

As permitted by the amendments referred to above, the power of election to 
the grades of Member and Junior Member was delegated by the Committee in 
August, 1944, to the Secretary-Treasurer, subject to certain reservations. The 
statement of qualifications for the different grades of membership as mentioned 
in (b) above is published below. At the August 13 meeting of the Board of 
Directors it was decided that no elections should be made at present to the grades 
of Honorary Member and Sustaining Member. 

On the recommendation of the Membership Committee the following members 
were elected as Fellows by the Board of Directors: W. Bartky, C. I. Bliss, G. M. 
Cox, P. A. Horst, M. G. Kendall, H. B. Mann, E. S. Pearson, H. Schefite, W. A. 
Wallis. 


Statement of Qualifications for the Different Grades of 
Membership in the Institute of 
Mathematical Statistics 

Member . The candidate shall either (a) be actively engaged in or show a 
serious interest in mathematical statistics, or (b) be interested in some applied 
field of statistics, with a desire to keep himself informed regarding recent develop¬ 
ments in mathematical theory and techniques. 

Junior Member . 

1. Any undergraduate student of a collegiate institution is eligible for election 
as a Junior Member of the Institute of Mathematical Statistics provided that he 
or she is sponsored by a member of the Institute. 

2. The annual dues ($2.60) must be submitted with the application. 
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3. Annual membership shall coincide with the calendar year and the Junior 
Member shall receive a complete volume of the Annals of Mathematical Statistics 
for the year in which he or she is elected. 

4. Junior Membership shall be limited to a term of two years, but a Junior 
Member may apply for transfer to ordinary membership at the beginning of his 
second year. 

Fellow. 

1. The candidate shall have evidenced continuing activity in research in 
mathematical statistics by publication beyond his doctor’s dissertation of in¬ 
dependent work of merit. Normally two or three worthwhile papers beyond the 
dissertation will be required to establish this fact. 

2. The first qualification may be partly or wholly waived in the case of (a) 
a candidate of well-established leadership among mathematical statisticians whose 
contributions to the development of the field of mathematical statistics other 
than sufficient published original research shall be judged of equal value or (b) 
a candidate of well-established leadership in the applications of mathematical 
statistics, whose work has contributed greatly to the utility of and the apprecia¬ 
tion for mathematical statistics. 

Honorary Member. A person of exceptional ability and acknowledged leader¬ 
ship in the field of mathematical statistics may be elected to the grade of Hon¬ 
orary Member by the Board of Directors, upon the recommendation of the 
Committee on Membership. 

Sustaining Member. The Board of Directors shall have the power to elect to 
Sustaining Membership any individual, group or corporation that is interested 
in furthering the purposes for which the Institute was formed. 

W. G. Cochran ( Chairman ) 
W. E. Deming 
P. S. Dwyer 
T. Koopmans 


February 10,1945 



PROGRESS REPORT OF THE COMMITTEE ON POST-WAR 
DEVELOPMENT OF THE INSTITUTE 

In considering the post-war development of the Institute of Mathematical 
Statistics, the Committee has recognized two general problems: 

A. The problem of what additional activities the Institute should undertake 
in order to provide further stimulus to the development of the field of 
mathematical statistics. 

B. The problem of determining how the Institute can cooperate more effec¬ 
tively with the users of statistical techniques. 

Because of rapidly increasing interest in the application of statistical methods: 
in many different fields, the Committee has directed most of its attention thus 
far to Problem B; the present progress report is concerned with the work of the 
Committee on this problem. The Committee hopes to submit a report on 
Problem A at the end of 1945. 

With respect to Problem B, it is the opinion of the Committee that a central 
organization for the statistical societies should be of common interest. Accord¬ 
ingly, a plan was worked out and submitted to the Board of Directors of the 
Institute at the Wellesley meeting of the Institute. This proposal and its 
present status are discussed below. 

We believe that there is much to be gained from an organization that would 
form a link between the various statistical societies, and would have the following 
principal aims: 

(1) To represent the members of the societies in all matters of common interest. 

(2) To promote cooperation between statisticians working in the different 
fields of application, and between mathematical statistics, applied statis¬ 
tics, scientific research and the industries. 

(3) To develop amongst the public an appreciation of the value of the statisti¬ 
cal method in scientific inquiry. 

It is our opinion that an organization similar to that of the Institute of Physics 
would be suitable. The statistical societies, while retaining their present auton¬ 
omies, would become founding members of a corporation whose governing 
board would contain representatives from each society. In pursuance of its aims 
as outlined above, the new organization might: 

(a) Take the lead in formulating policies on questions which concern all 
statisticians. 

(b) Publish a journal of general interest to statisticians and undertake the 
routine work in connection with the publication of the journals of the 
individual societies, the societies retaining in full their present responsi¬ 
bility for the contents of their journals. 

(c) Arrange joint meetings between different statistical societies and between 
statistical and other scientific societies. 
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(d) Assist new groups in organizing for their benefit, either under the auspices 
of one of the present societies or in a new society, which might at first be 
given associate membership and later full membership of the central 
organization. 

(e) Take steps to bring news about the use of statistics in scientific research 
to the attention of the public and more particularly of leaders in industry, 
in federal, state and local agencies and in education. 

(f) Investigate the demands for various types and degrees of statistical 
training, outline courses of training in statistics suitable for meeting these 
demands and make strenuous efforts to have the recommended courses 
of training put into effect, in order that statisticians can be of fullest 
service in the nation’s work. In this connection an information and 
placement bureau may be an appropriate auxiliary. 

(g) Institute an abstracting service in statistical methodology. This might 
take the form of a periodical publication of abstracts of papers with respect 
to their methodological content rather than their subject matter. The 
coverage would include journals of business, marketing, engineering, 
medicine and agriculture as well as purely statistical publications. 

The financial needs of the new organization, which would maintain a paid 
full-time staff, may be met initially by contributions from the present societies. 
In view of the extra services which would be rendered to statisticians, some 
increase in the subscription rates of the present societies appears reasonable. A 
member who belongs to more than one of the present societies would pay the 
extra amount only once. Supplementary income might be derived from ad¬ 
vertising in the journal of the central organization and from the establishment 
of sustaining or corporate memberships in the central organization. 

At the tune of the Wellesley meeting of the Board, there had been only in¬ 
formal contacts between members of this Committee and members of other 
statistical societies. We considered it our first task to obtain some consensus of 
opinion from the standpoint of the Institute of Mathematical Statistics. Fol¬ 
lowing general approval by the Board of Directors of the Institute, members of 
the Committee discussed the proposal for a central organization with representa¬ 
tives of several other statistical societies. The American Statistical Association 
has a Committee to consider the future structure of the Association and this 
Committee brought the Institute proposal before *the Board of Directors of the 
Association for action. As the oldest of the statistical societies, the American 
Statistical Association then invited participation in an intersociety committee 
by the Institute and nine other societies or sections, directly or indirectly con¬ 
cerned with statistical method. This committee is to explore the possibilities of 
coordinating the activities of the several statistical societies and report its 
recommendations back to each organization. The representatives have now 
been named and the first* meeting was held on February 10, 1945, in New York. 
At this meeting the Institute was represented by W. G. Cochran and Lt. John 
H. Curtiss. 
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With regard to the problem of what additional activities the Institute should 
undertake in order to furnish additional stimulation to the development of the 
field of mathematical statistics, the Committee has discussed several ideas which 
appear promising. It is hoped to present a complete report on this phase of the 
Committee’s work at the end of this year. 

C. I. Buss 

W. G. Cochran ( Chairman ) 
W. E. Deming 
P. S. Olmstead 
S. S. Wilks 


February 12, 1945 



CONSTITUTION 

OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE I 
Name and Purpose 

1. This organization shall be known as the Institute of Mathematical Statistics. 

2. Its object shall be to promote the interests of mathematical statistics. 

ARTICLE II 

Membership 

1. The membership of the Institute shall consist of Members, Junior Members, Fellows, 
Honorary Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others, Junior 
Members excepted, who have been members for twenty-three months prior to the date of 
voting. 

3. No person shall be a Junior Member of the Institute for more than a limited term as* 
determined by the Committee on Membership and approved by the Board of Directors. 

ARTICLE III 

Officers, Board of Directors, and Committee on Membership 

1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre¬ 
tary-Treasurer. The terms of office of the President and Vice-Presidents shall be one year 
and that of the Secretary-Treasurer three years. Elections shall be by majority ballots at 
Annual Meetings of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the in¬ 
dividuals present at the organization meeting, and shall serve until December 31, 1936. 

2. The Board of Directors of the Institute shall consist of the Officers, the two previous 
Presidents, and the Editor of the Official Journal of the Institute. 

3. The Institute shall have a Committee on Membership composed of a Chairman and 
three Fellows. At their first meeting subsequent to the Adoption of this Constitution, the 
Board of Directors shall elect three members as Fellows to serve as the Committee on 
Membership, one member of the Committee for a term of one year, another for a term of 
two years, and another for a term of three years. Thereafter the Board of Directors shall 
elect from among the Fellows one member annually at their first meeting after their elec¬ 
tion for a term of three years. The president shall designate one of the Vice-Presidents as 
Chairman of this Committee. 


ARTICLE IV 
Meetings 

1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 
time as the Board of Directors may designate. Additional meetings may be called from 
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time to time by the Board of Directors and shall be called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the date 
set for the meeting. All meetings except executive sessions shall be open to the public. 
Only papers accepted by a Program Committee appointed by the President may be pre¬ 
sented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term. Other meetings of the Board may 
be held from time to time at the call of the President or any two members of the Board. 
Notice of each meeting of the Board, other than the two regular meetings, together with a 
statement of the business to be brought before the meeting, must be given to the members 
of the Board by the Secretary-Treasurer at least five days prior to the date set therefor. 
Should other business be passed upon, any member of the Board shall have the right to 
reopen the question at the next meeting. 

3. Meetings of the Committee on Membership may be held from time to time at the call 
of the Chairman or any member of the Committee provided notice of such call and the 
purpose of the meeting is given to the members of the Committee by the Secretary- 
Treasurer at least five days before the date set therefor. Should other business be passed 
upon, any member of the Committee shall have the right to reopen the question at the 
next meeting. Committee business may also be transacted by correspondence if that 
seems preferable. 

4. At a regularly convened meeting of the Board of Directors, four members shall con¬ 
stitute a quorum. At a regularly convened meeting of the Committee on Membership, 
two members shall constitute a quorum. 

ARTICLE V 

Publications 

1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute. 
The Editor of the Annals of Mathematical Statistics shall be a Fellow appointed by the 
Board of Directors of the Institute. The term of office of the Editor may be terminated at 
the discretion of the Board of Directors. 

2. Other publications may be originated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion or Suspension 

1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 
Amendments 

1. This constitution may be amended by an affirmative two-thirds vote at any regularly 
convened meeting of the Institute provided notice of such proposed amendment shall have 
been sent to each voting member by the Secretary-Treasurer at least thirty days before the 
date of the meeting at which the proposal is to be acted upon. Voting may be in person or 
by mail. 
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ARTICLE I 

Duties of the Officers, the Editor, Board of Directors, and Committee on Mem¬ 
bership 

1. The President, or in his absence, one of the Vice-Presidents, or in the absence of the 
President and both Vice-Presidents, a Fellow selected by vote of the Fellows present, shall 
preside at the meetings of the Institute and of the Board of Directors. At meetings of the 
Institute, the presiding officer shall vote only in the case of a tie, but at meetings of the 
Board of Directors he may vote in all cases. At least three months before the date of the 
annual meeting, the President shall appoint a Nominating Committee of three members. 
It shall be the duty of the Nominating Committee to make nominations for Officers to be 
elected at the annual meeting and the Secretary-Treasurer shall notify all voting members 
at least thirty days before the annual meeting. Additional nominations may be sub¬ 
mitted in writing, if signed by at least ten Fellows of the Institute, up to the time of the 
meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings at 
the meetings of the Institute and of the Board of Directors, send out calls for said meetings 
and, with the approval of the President and the Board, carry on the correspondence of the 
Institute. Subject to the direction of the Board, he shall have charge of the archives and 
other tangible and intangible property of the Institute, and once a year he shall publish in 
the Annals of Mathematical Statistics a classified list of all Members and Fellows of the 
Institute. He shall send out calls for annual dues and acknowledge receipt of same; pay 
all bills approved by the President for expenditures authorized by the Board or the Insti¬ 
tute; keep a detailed account of all receipts and expenditures, prepare a financial statement 
at the end of each year and present an abstract of the same at the annual meeting of the 
Institute after it has been audited by a Member or Fellow of the Institute appointed by the 
President as Auditor. The Auditor shall report to the President. 

3. Subject to the direction of the Board, the Editor shall be charged with the responsi¬ 
bility for all editorial matters concerning the editing of the Annals of Mathematical Sta¬ 
tistics. He shall, with the advice and consent of the Board, appoint an Editorial Commit¬ 
tee of hot less than twelve members to co-operate with him; four for a period of five years, 
four for a period of three years, and the remaining members for a period of two years, ap¬ 
pointments to be made annually as needed. All appointments to the Editorial Com¬ 
mittee shall terminate with the appointment of a new Editor. The Editor shall serve as 
editorial adviser in the publication of all scientific monographs and pamphlets authorized 
by the Board. 

4. The Board of Diredtbrs'shall have charge of the funds and of the affairs of the In¬ 
stitute, with the exception of those affairs specifically assigned to the President or to the 
Committee on Membership. The Board shall have authority to fill all vacancies ad in¬ 
terim, occurring among the Officers, Board of Directors, or in any of the Committees. The 
Board may appoint such other committees as may be required from time to time to carry 
on the affairs of the Institute. The power of election to the different grades of Member¬ 
ship, except the grades of Member and Junior Member, shall reside in the Board. 

5. The Committeie on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the differ- 



ent grades of membership. The Committee shall review these qualifications periodically 
and shall make such changes in these qualifications and make such recommendations with 
reference to the number of grades of membership as it deems advisable. The power to 
elect worthy applicants to the grades of Member and Junior Member shall reside in the 
Committee, which may delegate this power to the Secretary-Treasurer, subject to such 
reservations as the Committee considers appropriate. The Committee shall make recom¬ 
mendations to the Board of Directors with reference to placing members in other grades 
of membership. The Committee shall give its attention to the question of increasing the 
number of applicants for membership and shall advise the Secretary-Treasurer on plans 
for that purpose. 


ARTICLE II 
Dues 

1. Members shall pay five dollars at the time of admission to membership and Bhall receive 
the full current volume of the Official Journal. Thereafter, Members shall pay five dol¬ 
lars annual dues. The annual dues of Junior Members shall be two dollars and fifty cents. 

The annual dues of Fellows shall be five dollars. The annual dues of Sustaining Members 
shall be fifty dollars. Honorary Members shall be exempt from all dues. 

(a) Exception. In the case that two Members of the Institute are husband and wife 
and they elect to receive between them only one copy of the Official Journal, the annual 
dues of each shall be three dollars and seventy-five cents. 

(b) Exception. Any Member or Fellow may make a single payment which will be 
accepted by the Institute in place of all succeeding yearly dues and which will not otherwise 
alter his status as a Member or Fellow. The amount of this payment will depend upon 
the age of this Member or Fellow and will be based upon a suitable table and rate of inter¬ 
est, to be specified by the Board of Directors. 

(c) Exception. Any Member or Junior Member of the Institute serving, except as a 
commissioned officer, in the Armed Forces of the United States or of one of its allies, may 
upon notification to the Secretary-Treasurer be excused from the payment of dues until the 
January first following his discharge from the Service. He shall have all privileges of 
membership except that he shall not receive the Official Journal. However during the first 
year of his resumed regular membership he may have the right to purchase, at $2.60 per 
volume, one copy of each volume of the Official Journal published during the period of his 
service membership. 

2. Annual dues shall be payable on the first day of January of each year. 

3. The annual dues of a Fellow, Member, or Junior Member include a subscription to the 
Official Journal. The annual dues of a Sustaining Member include two subscriptions to 
the Official Journal. 

4. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 
may be six months in arrears, and to accompany such notice by a copy of this Article. If 
such person fail to pay such dues within three months from the date of mailing such notice, 
the Secretary-Treasurer shall report the delinquent one to the Board of Directors, by whom 
the person’s name may be stricken from the rolls and all privileges of membership with¬ 
drawn. Such person may, however, be re-instated by the Board of Directors upon pay¬ 
ment of the arrears of dues. 
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ARTICLE III 
Salaries 

1. The Institute shall not pay a salary to any Officer, Director, or member of any com¬ 
mittee. 


ARTICLE IV 
Amendments 

1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend¬ 
ment has been previously approved by the Board of Directors. 
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A. Introduction 

By a sequential test of a statistical hypothesis is meant any statistical test 
procedure which gives a specific rule, at any stage of the experiment (at the 
n-th trial for each integral value of n), for making one of the following three 
decisions: (1) to accept the hypothesis being tested (null hypothesis), (2) to 
reject the null hypothesis, (3) to continue the experiment by making an addi¬ 
tional observation. Thus, such a test procedure is carried out sequentially. 
On the basis of the first trial, one of the three decisions mentioned above is made. 
If the first or the second decision is made, the process is terminated. If the 
third decision is made, a second trial is performed. Again on the basis of the 
first two trials one of the three decisions is made and if the third decision is 
reached a third trial is performed, etc. This process is continued until either 
the first or the second decision is made. 

An essential feature of the sequential test, as distinguished from the current 
test procedure, is that the number of observations required by the sequential 
test is not predetermined, but is a random variable due to the fact that at any 
stage of the experiment the decision of terminating the process depends on the 
results of the observations previously made. The current test procedure may 
be considered a limiting case of a sequential test in the following sense: For any 
positive integer n less than some fixed positive integer N, the third decision is 
always taken at the n-th trial irrespective of the results of these first n trials. 
At the iNT-th trial either the first or the second decision is taken. Which decision 
is taken will depend, of course, on the results of the N trials. 

In a sequential test, as well as in the current test procedure, we may commit 
two kinds of errors. We may reject the null hypothesis when it is true (error 
of the first kind), or we may accept the null hypothesis when some alternative 
hypothesis is true (error of the second kind). Suppose that we wish to test the 
null hypothesis Ii 0 against a single alternative hypothesis Hi , and that we want 
the test procedure to be such that the probability of making an error of the 
first kind (rejecting H 0 when H 0 is true) does not exceed a preassigned value a, 
and the probability of making an error of the second kind (accepting Ho when 
Hi is true) does not exceed a preassigned value (3. Using the current test pro¬ 
cedure, i.e., a most powerful test for testing Ho against Hi in the sense of the 
Neyman-Pearson theory, the minimum number of observations required by the 
test can be determined as follows: For any given number N of observations a 
most powerful test is considered for which the probability of an error of the first 
kind is equal to a. Let (3(N) denote the probability of an error of the second 
kind for this test procedure. Then the minimum number of observations is 
equal to the smallest positive integer N for which (3(N) < (3. 

In this paper a particular test procedure, called the sequential probability 
ratio test, is devised and shown to have certain optimum properties (see section 
4.7). The sequential probability ratio test in general requires an expected num¬ 
ber of observations considerably smaller than the fixed number of observations 
needed by the current most powerful test which controls the errors of the first 
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and second kinds to exactly the same extent (has the same a and 0) as the se¬ 
quential test. The sequential probability ratio test frequently results in a 
saving of about 50% in the number of observations as compared with the cur¬ 
rent most powerful test. Another surprising feature of the sequential prob¬ 
ability ratio test is that the test can be carried out without determining any 
probability distributions whatsoever. In the current procedure the test can be 
carried out only if the probability distribution of the statistic on which the test 
is based is known. This is not necessary in the application of the sequential 
probability ratio test, and only simple algebraic operations are needed for carry¬ 
ing it out. Distribution problems arise in connection with the sequential prob¬ 
ability ratio test only if we want to make statements about the probability dis¬ 
tribution of the number of observations required by the test. 

This paper consists of two parts. Part I deals with the theory of sequential 
tests for testing a simple hypothesis against a single alternative. In Part II a 
theory of sequential tests for testing simple or composite hypotheses against 
infinite sets of alternatives is outlined. The extension of the probability ratio 
test to the case of testing a simple hypothesis against a set of one-sided alterna¬ 
tives is straight forward and does not present any difficulty. Applications to 
testing the means of binomial and normal distributions, as well as to testing 
double dichotomies are given. The theory of sequential tests of hypotheses 
with no restrictions on the possible values of the unknown parameters is, how¬ 
ever, not as simple. There are several unsolved problems in this case and it is 
hoped that the general ideas outlined in Part II will stimulate further research. 

Sections 5.2, 5.3 and 5.4 in Part II deal with the applications of the sequential 
probability ratio test to binomial distributions, double dichotomies and normal 
distributions. These sections are nearly self-contained and can be understood 
without reading the rest of the paper. Thus, readers who are primarily in¬ 
terested in these special cases of the sequential probability ratio test rather than 
in the general theory, may profitably read only the above mentioned sections. 
For the benefit of readers who lack a sufficient background in the mathematical 
theory of statistics the exposition in sections 5.2, 5.3 and 5.4 is kept on a fairly 
elementary level. 

It should be pointed sut that whenever the number of observations on which 
the test is based is for some reason determined in advance, for instance, if certain 
data are available from past history and no additional data can be obtained, then 
the current most powerful test procedure is preferable. The superiority of the 
sequential probability ratio test is due to. the fact that it requires a smaller ex¬ 
pected number of observations than the current most powerful test. This 
feature of the sequential probability ratio test is, however, of no value if the num¬ 
ber of observations is for some reason determined in advance. 

B. Historical Note 

To the best of the author’s knowledge the first idea of a sequential test, i.e., 
a test where the number of observations is not predetermined but is dependent 
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on the outcome of the observations, goes back to H. F. Dodge and H. G. Romig 
who proposed a double sampling inspection procedure [1]. In this double samp¬ 
ling scheme the decision whether a second sample should be drawn or not de¬ 
pends on the outcome of the observations in the first sample. 4 The reason for 
introducing a double sampling method was, of course, the recognition of the fact 
that double sampling results in a reduction of the amount of inspection as com¬ 
pared with “single” sampling. 

The double sampling method does not fully take advantage of sequential 
analysis, since it does not allow for more than two samples. A multiple sampling 
scheme for the particular case of testing the mean of a binomial distribution was 
proposed and discussed by Walter Bartky [2]. His procedure is closely related 
to the test which results from the application of the sequential probability ratio 
test to testing the mean of a binomial distribution. Bartky clearly recognized 
the fact that multiple sampling results in a considerable reduction of the average 
amount of inspection. 

The idea of chain experiments discussed briefly by Harold Hotelling [3] is also 
somewhat related to our notion of sequential analysis. An interesting example 
of such a chain of experiments is the series of sample censuses of area of jute in 
Bengal carried out under the direction of P. C. Mahalanobis [6]. The succes¬ 
sive preliminary censuses, steadily increasing in size, were primarily designed to 
obtain some information as to the parameters to be estimated so that an efficient 
design could be set up for the final sampling of the whole immense jute area in 
the province. 

In March 1943, the problem of sequential analysis arose in the Statistical 
Research Group, Columbia University, 1 in connection with a specific question 
posed by Captain G. L. Schuyler of the Bureau of Ordnance, Navy Department. 
It was pointed out by Milton Friedman and W. Allen Wallis that the mere notion 
of sequential analysis could slightly improve the efficiency of some current most 
powerful tests. This can be seen as follows: Suppose that N is the planned 
number of trials and W N is a most powerful critical region based on N observa¬ 
tions. If it happens that on the basis of the first n trials (n < N) it is already 
certain that the completed set of N trials must lead to a rejection of the null 
hypothesis, we can terminate the experiment at the n-th trial and thus save some 
observations. For instance, if W N is defined by the inequality x\ + . . . + x# > c, 
and if for some n < N we find that x\ + ... + xl > c, we can terminate the 
process at this stage. Realization of this naturally led Friedman and Wallis to 
the conjecture that modifications of current tests may exist which take advantage 
of sequential procedure and effect substantial improvements. More specifically, 
Friedman and Wallis conjectured that a sequential test may exist that controls 
the errors of the first and second kinds to exactly the same extent as the current 

1 The Statistical Research Group operates under a contract with the Office of Scientific 
Research and Development and is directed by the Applied Mathematics Panel of the 
National Defense Research Committee. 
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most powerful test, and at the same time requires an expected number of observa¬ 
tions substantially smaller than the number of observations required by the 
current most powerful test. 2 

It was at this stage that the problem was called to the attention of the author 
of the present paper. Since infinitely many sequential test procedures exist, 
the first and basic problem was, of course, to find the particular sequential test 
procedure which is most efficient, i.e., which effects the greatest possible saving 
in the expected number of observations as compared with any other (sequential 
or non-sequential) test. In April, 1943 the author devised such a test, called 
the sequential probability ratio test, which for all practical purposes is most 
efficient when used for testing a simple hypothesis H 0 against a single alterna¬ 
tive Hi . 

Because of the substantial savings in the expected number of observations 
effected by the sequential probability ratio test, and because of the simplicity 
of this test procedure in practical applications, the National Defense Research 
Committee considered these developments sufficiently useful for the war effort 
to make it desirable to keep the results out of the reach of the enemy, at least for 
a certain period of time. The author was, therefore, requested to submit his 
findings in a restricted report [7] which was dated September, 1943. 8 In this 
report the sequential probability ratio test is devised and its mathematical theory 
is developed. In July 1944 a second report [8] was issued by the Statistical 
Research Group which gives an elementary non-mathematical exposition of 
the applications of the sequential probability ratio test, together with charts, 
tables and computational simplifications to facilitate applications. 

Independently of the developments here, G. A. Barnard [9] recognized the 
merits of a sequential method of testing, i.e., the possibility of a saving in the 
number of observations as compared with the current most powerful test. He 
also devised an interesting sequential test for testing double dichotomies, which 
differs from the one obtained by applying the sequential probability ratio test. 

Some further developments in the theory of the sequential probability ratio 
test took place in 1944. Extending the methods used in [7], C. M. Stockman 
[10] found the operating characteristic curve of the sequential probability ratio 
test applied to a binomial distribution. Independently of Stockman, Milton 
Friedman and George W. Brown (independently of each other) obtained the 
same result which can be extended to the normal distribution and a few other 
specific distributions, but is not applicable to more general distributions. The 
general operating characteristic curve for any sequential probability ratio test 
was derived by the author [11], A few months later the author developed a 
general theory of cumulative sums [4] which gives not only the operating char- 

* Bartky’s multiple sampling scheme [2] for testing the mean of a binomial distribution 
provides, of course, an example of such a sequential test (see, for example, the remarks on 
p. 377 in [2]). Bartky *s results were not known to us at that time, since they were published 
nearly a year later. 

* The material was recently released making the present publication possible. 
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acteristic curve for any sequential probability ratio test but also the character¬ 
istic function of the number of observations required by the test. 

The theory of the sequential probability ratio test as given in the present 
paper differs considerably from the exposition given in [7], since the new de¬ 
velopments in [4] have been taken into account. However, some tables and a 
few sections of the original report [7] are included in the present paper without 
any substantial changes. 

Part I. Sequential Test of a Simple Hypothesis Against a 
Single Alternative 

1. The Current Test Procedure 

Let X be a random variable. In what follows in this and the subsequent 
sections it will be assumed that the random variable X has either a continuous 
probability density function or a discrete distribution. Accordingly, by the 
probability distribution f(x) of a random variable X we shall mean either the 
probability density function of X or the probability that X = £, depending upon 
whether X is a continuous or a'discrete variable. Let the hypothesis Ho to be 
tested (null hypothesis) be the statement that the distribution of X is fo(x). 
Suppose that H 0 is to be tested against the single alternative hypothesis II x that 
the distribution of X is given by/i(x). 

According to the Neyman-Pearson theory of testing hypotheses a most power¬ 
ful critical region W N for testing H 0 against Hi on the basis of N independent 
observations xi , • • • , x N on X is given by the set of all sample points (xi , • • • , 
Xs) for which the inequality 

/, 1 % ■ ■ • flM ^ , 

( J /o(*l)/ofe) • • • /.(*,) - 

is fulfilled. The quantity k on the right hand side of (1.1) is a constant and is 
chosen so that the size of the critical region, i.e., the probability of an error of 
the first kind should have the required value a. 

For a fixed sample size N the probability (3 of an error of the second kind is a 
single valued function of a , say if a most powerful critical region is used. 

Thus, if in addition to fixing the value of a it is required that the probability of 
an error of the second kind should have a preassigned value 0, or at least it should 
not exceed a preassigned value jS, we are no longer free to choose the sample size 
N . The minimum number of observations required by the test satisfying these 
conditions is equal to the smallest integral value of N for which r (a) < 0. 

Thus, the current most powerful test procedure for testing Ho against Hi can 
be briefly stated as follows: We choose as critical region the region defined by 
(1.1) where the constant k is determined so that the probability of an error of 
the first kind should have a preassigned value a and N is equal to the smallest 
integer for which the probability of an error of the second kind does not exceed 
a preassigned value /3. 
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2* The Sequential Teat Procedure: General Definition* 

2.1. Notion of a sequential test . In current tests of hypotheses the number of 
observations is treated as a constant for any particular problem. In sequential 
tests the number of observations is no longer a constant, but a random variable 
In what follows the symbol n is used for the number of observations required by 
a sequential test and the symbol N is used when the number of observations is 
treated as a constant. 

Sequential tests can be described as follows: For each positive integer m the 
m-dimensional sample space M m is subdivided into three mutually exclusive 
parts Rm , Rm and R m . After the first observation x\ has been drawn Ho is 
accepted if Xi lies in #5, Ho is rejected (i.e., Hi is accepted) if Xi lies in R[ , or a 
second observation is drawn if xi lies in Ri . If the third decision is reached and 
a second observation x 2 drawn, Ho is accepted, Hi is accepted, or a third observa¬ 
tion is drawn according as the point ( xi , xi) lies in R \, R\ or in R 2 . If (xi, xi) 
lies in R 2 , a third observation x 8 is drawn and one of the three decisions is made 
according as (xi , x 2 , xi) lies in Rl , R\ or in Rz , etc. This process is stopped 
when, and only when, either the first decision or the second decision is reached. 
Let n be the number of observations at which the process is terminated. Then 
n is a random variable, since the value of n depends on the outcome of the 
observations. (It will be seen later that the probability is one that the sequential 
process will be terminated at some finite stage.) 

We shall denote by E 0 (n) the expected value of n if Ho is true and by Ei(n) 
the expected value of n if Hi is true. These expected values, of course, depend 
on the sequential test used. In order to put this dependence in evidence, we 
shall occasionally use the symbols E 0 (n | S) and Ei(n | S) to denote the values 
E 0 (n) and Ei(n), respectively, when the sequential test S is applied. 

2.2. Efficiency of a sequential test. As in the current test procedure, errors of 
two kinds may be committed in sequential analysis. We may reject Ho when 
it is true (error of the first kind), or we may accept Ho when Hi is true (error of 
the second kind). Witli any sequential test there will be associated two num¬ 
bers a and P between 0 and 1 such that if Ho is true the probability is a that we 
shall commit an error of the first kind and if Hi is true, the probability is P that 
we shall commit an error of the second kind. We shall say that two sequential 
tests S and S' are of equal strength if the values a and p associated with S are 
equal to the corresponding values a' and P' associated with S'. If a < a! and 
P < P', or if a < a' and p < P', we shall say that S is stronger than S'(S' is 
weaker than S). If a > a' and p < p', or if a < a! and p > P\ w'e shall say 
that the strength of S is not comparable with that of S'. 

Restricting ourselves to sequential tests of a given strength, we want to make 
the number of observations necessary for reaching a final decision as small as 
possible. If S and S' are two sequential tests of equal strength we shall say 
that S' is better than S if either E 0 (n | S') < E 0 (n | S) and Ei(n | S') < E x 
(n | S), or Eo(n | S') < E 0 (n | S) and E x {n | S') < Ei(n \ S). A sequential test 
will be said to be an admissible test if no better test of equal strength exists. 
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If a sequential test S satisfies both inequalities E 0 (n | S) < E 0 (n | S') and E\ 
(n | &) < Ei(n | S') for any sequential test S' of strength equal to that of S, then 
the test S can be considered to be a best sequential test. That such tests exist, 
i.e., that it is possible to minimize E 0 (n) and E\(n) simultaneously, is not proved 
here; but it fe shown later (section 4.7) that for the so called sequential prob¬ 
ability ratio test defined in section 3.1 both E 0 (n) and Ei(n) are very nearly 
minimized. 4 Thus, for all practical purposes the sequential probability ratio 
test can be considered best. 

Since it is unknown that a sequential test always exists for which both Eo(n) 
and Ei(n) are exactly minimized, we need a substitute definition of an optimum 
test. Several substitute definitions are possible. We could, for example, re¬ 
quire that the test be admissible and the maximum of the two values Eoin) and 

Ei(n) be minimized, or that the mean —, or some other weighted 

A 

average be minimized. All these definitions are equivalent if a sequential test 
exists for which both Eo(n) and Ei(n) are minimized; but if they cannot be mini¬ 
mized simultaneously the definitions differ. Which of them is chosen is of no 
significance for the purpose of this paper, since for the sequential probability 
ratio test proposed later both expected values E 0 (n) and Ei(n) are, if not exactly, 
very nearly minimized. If we had a priori knowledge as to how frequently H 0 
and how frequently Hi will be true in the long run, it would be most reasonable 
to minimize a weighted average (weighted by the frequencies of Ho and Hi , 
respectively) of E 0 (n) and Ei(n). However, when such knowledge is absent, 
as is usually the case in practical applications, it is perhaps more reasonable to 
minimize the maximum of E 0 (n ) and Ei(n) than to minimize some weighted 
average of Eoin) and 2?i(n). Hence the following definition is introduced. 

A sequential test £ is said to be an optimum test if S is admissible and Max 
[E 0 (n | &), Ei(n | S)] < Max [E 0 (n | £')> Ei(n | S')] for all sequential tests S' of 
strength equal to that of S. 

By the efficiency of a sequential test S is meant the value of the ratio 5 

Max [Eoin | £*), Ei(n | S*)] 

Max [Eoin \ S), Ei{n | S)] 

where S* is an optimum sequential test of strength equal to that of S. 

2.3. Efficiency of the current procedure , viewed as a particular case of a sequential 
test . The current test procedure can be considered as a particular case of a 
sequential test. In fact, let N be the size of the sample used in the current pro¬ 
cedure and let W N be the critical region on which the test is based. Then the 

4 The author conjectures that E 0 (n) and Ei(n) are exactly minimized for the sequential 
probability ratio test, but he did not succeed in proving this, except for a special class of 
problems (see section 4.7). 

* The existence of an optimum sequential test is not essential for the definition of effi¬ 
ciency, since Max [Eo(n | &*), E x (n | 5*)] could be replaced by the greatest lower bound of 
Max [E(tfi I S'), Ei{n I £')J with respect to all sequential tests S' of strength equal to that 
of S. 
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current procedure can be considered as a sequential test defined as follows: For 
all m <N, the regions Rh , Rm are the empty subsets of the m^iimensional sample 
space M m , and R m — M m . For m — N, Ry is equal to W* , ^ is equal to the 
complement W N of W N and R N is the empty set. Thus, for the current pro¬ 
cedure we have E 0 (n) = E\{n) = N. 

It will be seen later that the efficiency of the-current test based on the most 
powerful critical region is rather low. Frequently it is below In other words, 
an optimum sequential test can attain the same a and fi as the current most 
powerful test on the basis of an expected number of observations much smaller 
than the fixed number of observations needed for the current most powerful test. 

In the next section we shall propose a simple sequential test procedure, called 
the sequential probability ratio test, which for all practical purposes can be con¬ 
sidered an optimum sequential test. It will be seen that these sequential tests 
usually lead to average savings of about 50% in the number of trials as compared 
with the current most powerful test. 


3. Sequential Probability Ratio Test 

3.1. Definition of the sequential probability ratio test. We have seen in section 
2.1 that the sequential test procedure is defined by subdividing the m-dimensional 
sample space M m (m — 1, 2, • • • , ad inf.) into three mutually exclusive parts 
Rm , Rm and R m . The sequential process is terminated at the smallest value n 
of m for which the sample point lies either in R° n or in R\ . If the sample point 
lies in R° n we accept Ho and if it lies in R l n we accept Hi . 

An indication as to the proper choice of the regions Rm , Rm and R m can be 
obtained from the following considerations: Suppose that before the sample is 
drawn there exists an a priori probability that H 0 is true and the value of this 
probability is known. Denote this a priori probability by g Q . Then the a priori 
probability that Hi is true is given by gi = 1 — g 0 , since it is assumed that the 
hypotheses H 0 and Hi exhaust all possibilities. After a number of observations 
have been made we gain additional information which will affect the probability 
that Hi (i = 0,1) is tme. Let g 0m be the a posteriori probability that H 0 is true 
and gim the a posteriori probability that Hi is true after m observations have been 
made. Then according to the well known formula of Bayes we have 


(3.1) 

and 


Qom 


Qo POmfal i 


gopomixi, ■ • ■, x m ) 

• • • , Xm) + giPlm(x ly . • - ,Xm) 


_ giVlmjXl, "■,£») 

gopOm(Xi , • • • , X m ) + gi Plm{Xl , 


" » ®m) 


where pi m (x i, • • • , x m ) denotes the probability density in the m-dimensisnal 
sample space calculated under the hypothesis Hi (i = 0, l). 6 As an abbrevia¬ 
tion for pi m (x i ,••*,£«) we shall use simply p im . 


6 If the probability distribution is discrete pnn(xi , • • • , Xm) denotes the probability that 
the sample point (*i , • • • , «■») will be obtained. 
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Let do and d\ be two positive numbers less than 1 and greater than £. Suppose 
that we want to construct a sequential test such that the conditional probability 
of a correct decision under the condition that Ho is accepted is greater than or 
equal to do, and the conditional probability of a correct decision under the 
condition that Hi is accepted is greater than or equal to d\ . 7 Then the following 
sequential process seems reasonable: At each stage calculate pom and g\ m . If 
(Jim > di , accept Hi . If Qom > do , accept Ho . If gim < di and po m < do, draw 
an additional observation. Rm in this sequential process is thus defined by the 
inequality p 0m > do , Rm by the inequality g im > di , and R m by the simultaneous 
inequalities pi m < di and g 0m < d 0 . It is necessary that the sets Rm , RL and 
Rm be mutually exclusive and exhaustive. For this it suffices that the in- 


> di 


equalities 


(3.3) 

„ _ PlPlm 

gim . 

go Pom + 01 Pu 

and 


(3.4) 

„ _ gopom 

gom . 

goPOm -+■ Pi Pit 


> do 


be not fulfilled simultaneously. To show that (3 3) and (3.4) are incompatible, 
we shall assume that they are simultaneously fulfilled and derive a contradiction 
from this assumption. The two inequalities sum to 


(3.5) 


gim “f" gom ^ di *4“ do 


Since g Qm + gim = 1, we have 


1 ^ di + do 


which is impossible, since by assumption d< > J (i = 0, 1). Hence it is proved 
that the sets R° m , Rm and R m are mutually exclusive and exhaustive. 

The inequalities (3.3) and (3.4) are equivalent to the following inequalities, 
respectively: 


(3.6) 

and 


Vim y go d\ 
PQm Ql 1 "" dl 


(3.7) 


Vim ^ go 1 do 
VOm ""Pi do 


The constants on the right hand sides of (3.6) and (3.7) do not depend on m. 

If an a priori probability of Ho does not exist, or if it is unknown, the inequali¬ 
ties (3.6) and (3.7) suggest the use of the following sequential test: At each stage 


7 The restriction do > 1/2 and di > 1/2 are imposed because otherwise it might happen 
that the hypothesis with the smaller a posteriori probability will be accepted. 
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calculate pi «/p»„ . If pim = po* «*> 0, the value of the ratio pim/po* is defined 
to be equal to 1. Accept Hi if 

(3.8) 'P— > A. 

p0m 

Accept Ho if 

(3.9) — < £. 

POm 

Take an additional observation if 

(3.10) B < — < A. 

POm 

Thus, the number n of observations required by the test is the smallest integral 
value of m for which either (3.8) or (3.9) holds. The constants A and B are 
chosen so that 0 < B < A and the sequential test has the desired value a of the 
probability of an error of the first kind and the desired value 0 of the probability 
of an error of the second kind. We shall call the test procedure defined by (3.8), 
(3.9) and (3.10), a sequential probability ratio test. 

The sequential test procedure given by (3.8), (3.9) and (3.10) has been justi¬ 
fied here merely on an intuitive basis. Section 4.7, however, shows that for this 
sequential test the expected values E 0 (n) and Ei(n) are very nearly minimized. 8 
Thus, for practical purposes this test can be considered an optimum test. 

3.2. Fundamental relations among the quantities a, 0, A and B. In this section 
the quantities a, (3, A and B will be related by certain inequalities which are of 
basic importance for the sequential analysis. 

Let {x m \(m = 1,2, • • • , ad inf.) be an infinite sequence of observations. The 
set of all possible infinite sequences {x m \ is called the infinite dimensional sample 
space. It will be denoted by M „ . Any particular infinite sequence {x m } is 
called a point of M * . For any set of n given real numbers ai, • • • , a n we shall 
denote by C(oi, • • • , a n ) the subset of M * which consists of all points (infinite 
sequences) {x ro j (m = 1,2, • • • , ad inf.) for which xi = a x , • • • , x n = a n . For 

any values oi, • • • , a n the set C(ai, • • • , a n ) will be called a cylindric point of 

order n. A subset S of M * will be called a cylindric point, if there exists a posi¬ 
tive integer n for wiiich S is a cylindric point of order n. Thus, a cylindric point 
may be a cylindric point of order 1, or of order 2, etc. A cylindric point C(ai, 

• • • , a„) will be said to be of type 1 if 

Pin /l foO/ifa) • • • /i(fln) > ^ 

Pon /o(ai)/o(02) • • • /o(a„) “ 


8 It seems likely to the author that Eo(n) and Ei(n) are exactly minimised for the se¬ 
quential probability ratio test. However, he did not succeed in proving it, except for a 
special class of problems (see section 4.7). 
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and 


B < Pi*. .WaO^j^OjO < A ( W 
POm fo(a l) * * * fo(Om) 

A cylindric point C(ai , • • • , o») will be said to be of type 0 if 

Pin ^ /l(Ol) " * • flidn) < B 
POn foifll) • • • /o(fln) 

and 


1 , • 


n — 1) 


B< p J? = f/* 1 ] < A (m 

Pom /o(ttl) * • • /o(a«) 


1 , 




Thus, if a sample , • • • , x n ) is observed for which C(x i, • • • , x n ) is a cylindric 
point of type i , the sequential test defined by (3.8), (3.9) and (3.10) leads to the 
acceptance of Hi (t = 0, 1). 

Let Qi be the sum of all cylindric points of type i (i = 0, 1). For any subset 
M of M * we shall denote by Pi(M) the probability of M calculated under the 
assumption that Hi is true (i — 0, 1). Now we shall prove that 


(3.11) 


Pi(Qo + QO = 1 


a = o, i) 


This equation means that the probability is equal to one that the sequential 
process will eventually terminate. To prove (3.11) we shall denote the variate 


log by 2 . and z x + 
MXi) 


+ z m by Z m (i, m = 1, 2, • • • , ad inf.). Further¬ 


more, denote by n the smallest integer for which either Z n > log A or Z n < 
log B. If no such finite integer n exists we shall say that n — ». Clearly, n is 
the number of observations required by the sequential test and (3.11) is proved 
if we show that the probability that n — o o is zero. But the latter statement 
was proved by the author elsewhere (see Lemma 1 in [4]). Hence equation 
(3.11) is proved. 

With the help of (3.11) we shall be able to derive some important inequalities 
satisfied by the quantities a , 0, A and B. Since for each sample (.Ti, • ■ • , x n ) 
for which C(x i, • • • , x M ) is an element of Q\ the inequality pi n /pon > A holds, 
we see that 


(3.12) 


Pi(Qi) > APq(Qi) 


Similarly, for each sample (xi , • • • , x n ) for which C(x \, • • • , x n ) is a point of 
Qo the inequality pi«/po« < B holds. Hence 

(3.13) Pi«?o) < BPoiQo). 


But Po(Qi) is the probability of committing an error of the first kind and Pi(Qo) 
is the probability of making an error of the second kind. Thus, we have 

(3.14) Po(Qi) = «; Pi(0o) = P.- 
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Since Q& and Qi are disjoint, it follows from (3.11) that 

(3.15) P 0 (Qo) « 1 - a; Pi(Qi) *1-/3. 

From the relations (3.12)-(3.15) we obtain the important inequalities 

(3.16) 1 — 0 > A a 
and 

(3.17) 0 < B (1 - a). 

These inequalities can be written as 


(3.18) 
and 

(3.19) 


<1 

1 -0 - A 


0 


1 - a 


< B. 


The above inequalities are of great value in practical applications, since they 
supply upper limits for a and 0 when A and B are given. For instance, it follows 
immediately from (3.18) and (3.19), and the fact that 0<a<l,0</3<l that 

(3.20) a<~ A 


»<B. 


and 
(3.21) 

A pair of values a and 0 can be represented by a point in the plane with the 
coordinates a and 0. It is of interest to determine the set of all points (a, 0) 
which satisfy the inequalities (3.18) and (3.19) for given values of A and B . 
Consider the straight lines Li and L 2 in the plane given by the equations 


(3.22) 

and 


Aa = 1 - 0 


0 = 5(1 — a), 


The line L\ intersects the abscissa axis at a = and the ordinate 

A 


(3.23) 

respectively. 

axis at 0 — 1. The line L 2 intersects the abscissa axis at a = 1 and the ordinate 
axis at 0 = B. The set of all points (a, 0) which satisfy the inequalities (3.18) 
and (3.19) is the interior and the boundary of the quadrilateral determined by 
the lines L \, L 2 and the coordinate axes. This set is represented by the shaded 
area in figure 1. 

The fundamental inequalities (3.18) and (3.19) were derived under the assump¬ 
tion that x \, x 2 , • • • , ad inf. are independent observations on the same random 
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variable X . The independence of the observations is, however, not necessary 
for the validity of (3.18) and (3.19). In fact, the independence of the observa¬ 
tions was used merely to show the validity of (3.11). But (3.11) can be shown 
to hold also for dependent observations under very general conditions. Hence, 
if Hi states that the joint distribution of x \, x 2 , • • • , x m is given by the joint 
probability density function Pim(xi , ■ • * , x m ) 9 (i = 0, 1; m = 1,2, • • * , ad inf.) 
and if (3.11) holds, then for the sequential test of Ho against Hi , as defined by 
(3.8), (3.9) and (3.10), the inequalities (3.18) and (3.19) remain valid. For 
instance, let X 0 and Xi be two different positive values < 1 and let Hi{i = 0, 1) 
be the hypothesis that the joint probability density function of x \, • • • , x m is 
given by 



Pim(Xi , 



v (*,—Xix ,-!) 2 

7-2 


(i = 0, 1) 


i.e., that Xi and (xj — \iXj-i)(j = 2, 3, • • • , ad inf.) are normally and inde¬ 
pendently distributed with zero means and unit variances, then the inequalities 
(3.18) and (3.19) will hold for the sequential test defined by (3.8), (3.9) and 
(3.10). 

3.3. Determination of the values A and B in practice. Suppose that we wish 
to have a sequential test such that the probability of an error of the first kind is 
equal to a and the probability of an error of the second kind is equal to 0. De- 


* Of course, for any positive integers m and m' with m < m' the marginal distribution of 
Xi , • • • , Xm determined on the basis of the joint distribution Pi m t(xi , • • • , s*/) must be 
equal to P,«(*i, * • • , x n ). 
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note by A(a, 0) and B(a, 0) the values of A and B for which the probability of 
the errors of the first and second kinds will take the desired values a and 0. 
The exact determination of the values A (a, 0) and B(a, 0) is rather laborious, as 
will be seen in Section 3.4. The inequalities at our disposal, however, permit the 
problem to be solved satisfactorily for practical purposes. From (3.18) and 
(3.19) it follows that 

(3.24) A(a, fi) < * 

a 

and 


(3.25) 

Suppose we put A = --- = a(a, /3) (say), and B — —■ — = b{a, fh) (say). 

a 1 — cl 

Then A is greater than or equal to the exact value A (a, 0 ), and B is less than or 
equal to the exact value B(a , 0). This procedure, of course, changes the prob¬ 
abilities of errors of the first and second kind. If we were to use the exact value 
of B and a value of .4 which is greater than the exact value, then evidently we 
would lower the value of a, but slightly increase the value of 0 . Similarly, if 
we were to use the exact value of A and a value of B which is below the exact 
value, then we would lower the value of 0 , but slightly increase the value of a . 
Thus, it is not clear what will be the resulting effect on a and 0 if a value of A is 
used which is higher than the exact value, and a value of B is used which.is lower 
than the exact value. Denote by a! and 0' the resulting probabilities of errors 

1 — q n 0 

of the first and second kind, respectively, if we put A = ——— and B = \ — a * 

We now derive inequalities satisfied by the quantities a', a and 0. Sub¬ 
stituting a(a, 0) for A, b(a , 0) for B , a! for a and 0' for 0 we obtain from (3.18) 
and (3.19) 


(3.26) 


a ^ 1 a 

1 - 0' “ a(a t 0) ~ 1 ^- 0 


and 


(SOT TZTS S i, («. ® - nrv 

From these inequalities it follows that 


(3.28) 

and 




(3.29) 
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Multiplying (3.26) by (1 — 0)(l — 0') and (3.27) by (1 — «)(1 — a') and adding 
the two resulting inequalities, we have 


(3.30) 


+ ft < a + 0. 


Thus, we see that at least one of the inequalities a' < a and /S' < 0 must hold. 
In other words, by using a(«, 0) and b(a, 0) instead of A (a, 0) and J3(a, 0), re¬ 
spectively, at most one of the probabilities a and 0 may be increased. 

If a and 0 are small (say less than .05), as they frequently will be in practical 
a , 0 


applications, 


, and 


are nearly equal to a and 0, respectively. Thus, 


— 0 l — a 

we see from (3.28) and (3.29) that the quantity by which a' can possibly exceed 
a, or 0' can exceed 0 , must be small. Section 3.4 contains further inequalities 
which show that the amount by which a'(0') can possibly exceed a(0) is indeed 
extremely small. Thus, for all practical purposes a' < a and 0' < 0 . 

If fi(x) (the distribution under the alternative hypothesis) is sufficiently near 
fo(x) (the distribution under the null hypothesis), A (a, 0) and B (a, 0) will be 
1—30 

nearly equal to -- and ;-, respectively; and consequently a ' and 0' are 

a 1 — a 

also ver} r nearly equal to a and 0 respectively. The reason that (3.18) and 
(349) and therefore also (3.24) and (3.25) are inequalities instead of equalities 

is that the sequential process may terminate with — > A or < B. If at 

POn Pon 

the final stage were exactly equal to A or B, then A (a, 0) and B(a , 0) would 
Bon 


be exactly 


1 - 


and 


1 — a 


, respectively. If fi(x ) is near /o(z), it is almost 


certain that the value of ~ is changed only slightly by one additional observa- 

B On 

tion. Thus, at the final stage ^ will be only slightly above A , or slightly below 

Bon 


B and consequently A (a, 0) and B(a, 0) will be nearly equal to 


0 


and 


0 


a l - a 1 

respectively. If fractional observations were possible, that is to say, if the num- 

p 

her of observations were a continuous variable, ~ would also be a continuous 

Bom 

function of m and consequently A (a, 0) and B (a, 0) would be exactly equal to 


and 


0 


-, respectively. Thus, we have inequalities in (3.24) and (3.25) 


1 -0 

- auu , 

a 1 — a 

instead of equalities merely on account of the fact that the number m of observa¬ 
tions is discontinuous, i.e., m can take only integral values. 

Hence for all practical purposes the following procedure can be adopted: To 
construct a sequential test such that the probability of an error of the first kind does 
not exceed a and the probability of an error of the second kind does not exceed 0 , put 



SEQUENTIAL TESTS 188 


1 — ft 0 

A * —— and B = - 3 - and carry out the sequential test as defined by the in¬ 
equalities (3.8), (3.9) and (3.10). 

In most practical cases the calculation of the exact values A (a, 0) and B(a, 0) 

1 ^ a 

will be of little interest for the following reasons: When A = a(a, 0) = -- 


and jB = 6 (a, 0) 


0 


1 - a 


, the probability a! of an error of the first kind cannot 


exceed a and the probability 0' of an error of the second kind cannot exceed 0, 
except by a very small quantity which can be neglected for practical purposes. 
Thus, for all practical purposes the use of a(a, 0) and b(ct, 0) instead of A (a, 0) 
and B(a, 0) will not decrease the strength of the sequential test. The only 
possible disadvantage from the substitution is that it may increase the expected 
number of trials necessary for a decision. Since the discrepancy between A (a, 0) 
and B(a, 0) on the one hand and a(a, 0) and b(a, 0) on the other, arises only 
from the discontinuity of the number m of observations, it is clear that the in¬ 
crease in the expected number of trials caused by the use of a(a, 0) and b(a f 0) 
will be slight. This slight increase, however, cannot be considered entirely a 
loss for the following reason: if a(a, 0) > A (a, 0) or 6(a, 0) < B(a , 0), then we 
can sharpen the inequality (3.30) to a' + 0' < a + 0. Hence by using a(a, 0) 
and b(a , 0) we gain in strength. 

The fact that for practical purposes we may put A = a (a, 0) and B = 
b(a , 0) brings out a surprising feature of the sequential test as compared with 
current tests. While current tests cannot be carried out without finding the 
probability distribution of the statistic on which the test is based, there are no 
distribution problems in connection with sequential tests. In fact, a(a, 0) and 

b (a, 0) depend on a and 0 only, and the ratio — can be calculated from the data 

P(hn 

of the problem without solving any distribution problems. Distribution prob¬ 
lems arise in connection with the sequential process only if it is desired to find the 
probability distribution of the number of trials necessary for reaching a final 
decision. (This subject is discussed later.) But this is of secondary importance 
as long as we know that the sequential test on the average leads to a saving in 
the number of trials. 

3.4. Probability of accepting Ho (or Hi) when some third hypothesis H is true. 
In Section 3.2 we were concerned with, the probability that the sequential prob¬ 
ability ratio test will lead to the acceptance of Ho (or Hi) when H 0 or Hi is true. 
Since in Part II we shall admit an infinite set of alternatives, and since this is 
the practically important case, it is of interest to study the probability of accept¬ 
ing Ho (or Hi) when any third hypothesis H , not necessarily equal to Ho or Hi , 
is true. Let H be the hypothesis that the distribution of X is given by /(a?). 
If /(s) is equal to/oOr) or/i(x) we have the special case discussed in Section 3.2. 
In what follows in this and the subsequent sections any probability relationship 
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will be stated on the assumption that H is true, unless a statement to the con¬ 
trary is explicitly made. Denote by 7 the probability that the sequential prob¬ 
ability ratio test will lead to the acceptance of Hi , 10 Clearly, if H = Ho , then 
7 * a and if H =* Hi , then 7 = 1 — /3. 

The probability 7 can readily be derived on the basis of the general theory of 

fi(x) * 

cumulative sums given in [4]. Denote log by Zi . Then {zi\ (i = 2, • • * , 


ad inf.) is a sequence of independent random variables each having the same dis¬ 
tribution. Denote by Zj the sum of the first j elements of the sequence {z»} i.e., 


(3.31) 


Zj ~ Z\ + ■ • • + Zj 


(j = 1 , 2 , , ad inf.)* 


For any relation R we shall denote by P(R) the probability that R holds. For 
any random variable Y the symbol EY will denote the expected value of Y. 
Let n be the smallest positive integer for which either Z n > log A or Z n < log B 
holds. If log B < Z m < log A holds for m = 1 , 2 , • • • , ad inf., we shall say that 
n = 00 . Obviously, n is the number of observations required by the sequential 
probability ratio test. As we have seen in Section 3.3, in practice we shall put 

j _ 0 q 

A = a(a, fi) = -- and B = b(a, 0) = ;-. Since B must be less than A , 

a l — a 

l — R p 

we shall consider only values a and (3 for which- > ;-. This inequality 

‘ a I — a 

is equivalent to a + @ < 1 , which in turn implies that B < 1 and A > 1 . Thus, 
in all that follows it will be assumed that A > 1 and B < 1 . We shall also 
assume that the variance of Zi is not zero. 

According to Lemma 1 in [4] the relation P(n = oo) = 0 holds. Hence, the 
probability is equal to one that the sequential process will eventually terminate. 
This implies that the probability of accepting H 0 is equal to 1 — 7 . 

Let z be a random variable whose distribution is equal to the common dis¬ 
tribution of the variates (i = 1 , 2 , • • • , ad inf.), denote by <p(t) the moment 
generating function of z , i.e., 

*>(t) - Ee Bt . 


It was shown in [4] that under very mild restrictions on the distribution of z 
there exists exactly one real value h such that h 9 ^ 0 and <p(h) = 1 . Furthermore, 
it was shown in [4] (see equation (16) in [4]) that 

(3.32) Ee z * h = 1. 

Let E* be the conditional expected value of e Znh under the restriction that H 0 
is accepted, i.e., that Z n < log B , and let E** be the conditional expected value 
of e Znh under the restriction that Hi is accepted, i.e., that Z n > log A . Then we 
obtain from (3.32) 


(3.33) 


(1 - 7 )E* + 7 E** = 1 


10 The probability that Hq will be accepted is equal to 1 — 7 , as will be seen later. 
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Solving for 7 we obtain 
(3.34) 


1 - E* 

' £7** - E*' 


If both the absolute value of Ez and the variance of z are small, which will be the 
case when f\(x) is near fo(x), E* and £7** will be nearly equal to B h and A h ? re¬ 
spectively. Hence, in this case a good approximation to 7 is given by thd ex¬ 
pression 


(3.35) 


7 


1 - B h 
A h - B h ' 


It is easy to verify that h — 1 if H = Ho , and h = — 1 if £T « . The differ¬ 

ence 7—7 approaches zero if both the mean and the variance of z converge to 
zero. 

To judge the goodness of the approximation given by 7 , it is desirable to de¬ 
rive lower and upper limits for 7 . Such limits for 7 can be obtained by deriving 
lower and upper limits for E* and E **. First we consider the case when h > 0. 
Let f be a real variable restricted to values > 1 , and let p be a positive variable 
restricted to values < 1 . For any random variable Y and any relationship R 
we shall denote by £7(F | R) the conditional expected value of Y under the re¬ 
striction that R holds. It was shown in [4] that the following inequalities hold ,: 11 

(3.36) B h jg.l.b. tE (e h ‘ \ e h ‘ < < E* < B h (h > 0) 

and 

(3.37) A h < E** < A h jl.u.b. P E (e* | e h ‘ > (h > 0). 


The symbol g.l.b. stands for the greatest lower bound with respect to f, and the 
r 

symbol l.u.b. stands for least upper bound with respect to p. Putting 

p 

(3.38) g.l.b. (E | e“ < A = r, 
and 

(3.39) l.u.b. P E (e hf | e ht > 0 = S, 


the inequalities (3.36) and (3.37) can be written as 

(3.40) B k v <E* < B k . (h > 0) 


11 See relations (23) and (26) in [4]. The notation used here is somewhat different from 
that in [4]. 
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and 

(3.41) A h < E** < A h 8 (h> 0). 


Since B < 1 and A > 1 , we see that E* < 1 and E** > 1 if h > 0. From 
this and the relations (3.34), (3.40) and (3.41) it follows easily that 

(3>42) a t^b* - 7 - {h > 0) 


A h - nB h 


1 


If h < 0 , limits for 7 can be obtained as follows: Let z f = — A' = £' 


A* 


Then /i' = — h > 0 and 7 ' =* 1 — 7 . Thus, according to (3.42) we have 


(3.43) 


1 - {B'f 


A*' 


- (B'y 




1 - v\B2 

(A'f - 


where 5' and 17 ' are equal to the expressions we obtain from (3.38) and (3.39), 
respectively, by substituting h' for h and z' for 2 . Since 17 and 8 depend only on 
the product hz = h’z\ we see that 5' = 6 and 77 ' = 17 . Hence, we obtain from 
(3.43) 


(3.44) 


1 - A h ^ ^ 1 - v A h 

SB* - A h ~ 7 “ B h - v A h 


(h < 0 ) 


where 8 and 17 are given by (3.38) and (3.39), respectively. 

In Section 3.5 we shall calculate the value of rj and 8 for binomial and normal 
distributions. If the limits of 7 , as given in (3.42) and (3.44), are too far apart, 
it may be desirable to determine the exact value of 7 , or at least to find a closer 
approximation to 7 than that given in (3.35). A solution of this problem is 
given in [4] (see section 7 of that paper). There the exact value of 7 is derived 
when z can take only a finite number of integral multiples of a constant d. If z 
does not have this property, arbitrarily fine approximation to the value of 7 
can be obtained, since the distribution of z can be approximated to any desired 
degree by a discrete distribution of the type mentioned before if the constant d 
is chosen sufficiently small. The results obtained in [4] can be stated as follows: 
There is no loss of generality in assuming that d = 1 , since the quantity d can 
be chosen as the unit of measurement. Thus, we shall assume that z takes only 
a finite number of integral values. Let gi and g 2 be two positive integers such 
that P{z = —gi) and P(z = g 2 ) are positive and z can take only integral values 
> — 0 i and < 02 . Denote P{z = i) by hi. Then the moment generating 
function of z is given by 

a 2 

<p(t) = 2 
—01 

Put u = e* and let Ui , • • • u 0 be the g = gi + g 2 roots of the equation of 0 -th 
degree 

( 3 . 45 ) £ hiU 4 = 1 . 
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Denote by [a] the smallest integer > log A , and by [b] the largest integer < log B. 
Then Z n can take only the values 

(3.46) [b] — gi + 1, [b] — gi + 2, • • • , [6], [a], [a] + 1, • • • , fa] + Q %— h 

Denote the g different integers in (3.46) by ci, * • • , e g , respectively. Let A be 
‘the determinant value of the matrix || uV || (i, j = 1 , • • • , g) and let A* be the 
determinant we obtain from A by substituting 1 for the elements in the j-th 
column. Then, if A ^ 0, the probability that Z n = Cj is given by 

(3.47) P(Z n = c s ) =4'- 

A 


(3.48) 


7 = P(Z n > [a]) 


V A) 

r a 


where the summation is to be taken over all vaues of j for which Cj > [a]. 

3.5. Calculation of & and v for binomial and normal distributions. Let X be a 
random variable which can take only the values 0 and 1. Let the probability 
that X = 1 be Pi if Hi is true ( i = 0, 1), and p if H is true. Denote 1 — pby q 
and 1 - pi by g, (i = 0,1). Then /»( 1 ) = p { ;/<(0) = g,-,/(l) = pand/(0) = q. 
It can be assumed without loss of generality that pi > po • The moment generat- 

f (r) 

ing function of z = log is given by 
M x ) 

' ( ‘>-K/£!)‘-’ > fe)' +9 fe)'- 

Let h 9 * 0 be the value of t for which <p(h) = 1 , i.e., 

p (s) +, ©“- 

First we consider the case when h > 0 . It is clear that e th — > 1 im¬ 
plies that x = 1 . Hence e* k > 1 implies that e zh — ^ • From 

this and the definition of 8 given in (3.39) it follows that 


(3.49) 


-fey 


Similarly, the inequality e h < 1 implies that e th = (- 

\2o, 

definition of given in (3.38) it follows that 


(h > 0). 
From this and the 
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If h < 0, it can be shown in a similar way that 


(3.51) 

“(t)‘ 

(ft< 0) 

and 



(3.52) 

11 

S=* 

(jh < 0). 

Now we shall calculate the values of 8 and n if X is normally distributed. Let 

(3.53) 


a = o, i) 

and 



(3.54) 




We can assume without loss of generality that 0 O = — A and = A where A > 0, 
since this can always be achieved by a translation. Then 

(3.55) 2 = log HI = 2Ax. 

The moment generating function of z is given by 

(3.56) <p(l) = e 2A9 ‘ +2iJ, \ 

Hence 

(3.57) h = — 

Substituting this value of h in (3.38) and (3.39) we obtain 

(3.58) S = Lu.b. pE^e- 2 * 1 \ e~ 2lT > 0 
and 

(3.59) V = g-bb. rE (c~ 2> ' I (T 291 < ^. 

For any relation R let P*(R) denote the probability that the relation R holds 
calculated under the assumption that the distribution of x is normal with mean 
0 and variance unity. Furthermore, let P**(R) denote the probability that R 
holds if the distribution of x is normal with mean — 0 and variance unity. Since 
e~ 26 * is equal to the ratio of the normal probability density function with mean 
— 0 and variance unity to the normal probability density function with mean 0 
and variance unity, we see that 
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It can easily be verified that the right hand side expressions in (3.60) and 
(3.61) have the same values for 0 — X as for 0 = —X. Thus, also 5 and y have the 
same values for 0 = X as for 0 = — X. It will be, therefore, sufficient to compute 
5 and y for negative values of 0. Let 6 = —X where X > 0. First we show that 

y = Clearly 


(3.62) 


j-P** ^ < ij ^ ^ „ 

p*( e -a) * « 


(1 < f < »)• 


lotting f = - (0 < p < 1) in (3.62) gives 
P 


(3.63) 


fP** < Ij P** ^e“ 2Xx 
P* (e* x < ^ pP* (e _2Xx 



Hence 



v = g-l.b. < 

f 

fP** (e 2Xl < 0 



1 

(3.64) 

P*(e sx * < J) 


pp* 

(e _sXl > i) 



l \ $7 J 

l.u.b. < 

p 

p** 

\ P / 


Because of the symmetry of the normal distribution, it is easily seen that 
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Hence 


(3.65) 


1 

V = ~5- 


1 f" 

Now we shall calculate the value of 5. Denote ^ 7 = y e~ ti,2 dt hyG(x). Then 
P** (e* z > 0 = P** (2\x > log 


Similarly 


p.(e->i)-p.( I >^ | <*l). 0 (|. log i + x). 

Denote — log - by u. Since p can vary from 0 to 1 , u can take any value from 

ZK p 

0 to 00 . Since p = e -2Xu , we have 


(3.66) b = l.u.b. 


' ( e '“ - ;) 




We shall prove that 


(3.07) < My ) 

is a monotonically decreasing function of u and consequently the maximum is 
at u = 0. For this purpose it suffices to show that the derivative of log \(m) 
is never positive. Now 

(3.68) log x(w) = log G(u - X) - log G(u + X) - 2Xw. 

1 d 

Denote e~ ix2 by 4>(x). Since ~ G(u) = —3>(w) it follows from (3.68) that 
(3.00, l lo* .(.) - 

It follows from the mean value theorem that the right hand side of (3.69) is 
d /$(u)\ 

never positive if ^ * S et * ua * or * ess ^ an * ^ or va * ues Ut Thus, 

we need merely to show that 



d / $(u) \ ^ 
du \G (u)) 
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(u)0(u) - G'(u)*(u) 


_ <f >'(u)G(u) + tf(u) _ $*(«) #(w) ^ . 

0\u) (?»(«) G(u) ~ 


$(u) 

Denote 7 — by y. The roots of the equation y — uy — 1 — 0 are 

(j(U) 

u ± V^+l 
V 2 

Hence the inequality y 2 — wy — 1 < 0 holds if and only if 


u — \/ w 2 + 4 


< y < 


u + \/w 2 + 4 


Since y cannot be negative, this inequality is equivalent to 

CO -71*(«) ^ w + Vm 2 + 4 
( 3. ?1) m - y < -g- • 

Thus we have merely to prove (3.71). We shall show that (3.71) holds for 
all real values of u. Birnbaum has shown [5] that for u > 0 


Hence 


\/ u 2 + 4 — u 


G\u) y/ u 2 + 4 


Hu) < G(u). 


___ y/u 2 + 4 + u 


(w > 0) 


which proves (3.71) for u > 0. Now we prove (3.71) for u < 0. Let u = — t> 
where v > 0 . Then it follows from (3.73) that 

(3.74) < '/-r-—“i>-. 

G(v) V 4 + v 2 — v 

Taking reciprocals, we obtain from (3.74) 

(3.75) m > V 4 Y - r . 

<&(v) 2 

Since 

<?(u) . G(v) ± 2,Mv) _ G(v) 


we obtain from (3.75) 


G(m) > Vv* + 4 + 3v yV + 4 + 
<t>(w) ““ 2 ~ 2 


(3.76) 


v 
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Taking reciprocals, we obtain 

$(u) ^ 2 _vV + 4 — r _ \/u 2 + 4 + u 

Q(u) “ Vv 2 + 4 + v 2 2 ,- 

Hence (3.71) is proved for all values of u and consequently 5 is equal to the value 
of the expression (3.67) if we substitute 0 for u. Thus, 


(3.77) 


8 = 


G(-X) 
G(X) ' 


4. The Number of Observations Required by the Sequential Probability 

Ratio Test 

4.1. Expected number of observations necessary for reaching a decision . As 
before, let 


2 = log^g , Zi = (i = 1,2, ■ ■ ■ , ad inf.) 

/ow /ow) 

and let n be the number of observations required by the sequential test, i.e., n is 
the smallest integer for which Z n = Z\ + • • * z« is either >log A or <log B. 
To determine the expected value E(ri) of n under any hypothesis H we shall 
consider a fixed positive integer N . The sum Z N = z x + * * * + ts can be split 
in two parts as follows 

(4.1) Z K = + Z' n 

where ZJ n = z n +1 + • ■ • + z N if n < N and Z' n = Z N — Z n if n > N. Taking 
expected values on both sides of (4.1) we obtain 

(4.2) NEz = EZ n + EZ f n . 

Since the probability that n > N converges to zero as N —► <*>, and since 
| Z' n | < 2(log A + | log B | ) if n > N, it can be seen that 

(4.3) lim [EZ' n - FAN - n)Ez\ - 0. 


From (4.2) and (4.3) it follows that 


(4.4) 
Hence 

(4.5) 


EZ n = EnEz . 


En 


EZ n 

Ez 


Let E*Z n be the conditional expected value of Z n under the restriction that the 
sequential analysis leads to the acceptance of H 0 , i.e. that Z n < log B. Simi¬ 
larly, let E**Z n be the conditional expected value of Z n under the restriction that 
Hi is accepted, i.e., that Z„ > log A. Since 7 is the probability that Z n > log A , 
we have 


( 4 . 6 ) 


EZn = (1 - 7 )E*Zn + 7 E**Z n . 
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From (4.5) and (4.6) we obtain 

(4.7) F„ - a--T)gZ. + TB“g. 

Ez 

The exact value of EZ n , and therefore also the exact value of En, can be com¬ 
puted if z can take only integral multiples of a constant d } since in this case the 
exact probability distribution of Z n was obtained (see equation (3.47)). If z 
does not satisfy the above restriction, it is still possible to obtain arbitrarily fine 
approximations to the value of EZ n , since the distribution of z can be approxi¬ 
mated to any desired degree by a discrete distribution of the type mentioned 
above if the constant d is chosen sufficiently small. 

If both | Ez | and the standard deviation of z are small, E*Z n is very nearly 
equal to log B and E**Z n is very nearly equal to log A . Hence in this case we 
can write 


(4.8) En ~ Qiiv)j08£±Tljggi. 

Ez 

To judge the goodness of the approximation given in (4.8) we shall derive lower 
and upper limits for En by deriving lower and upper limits for E*Z n and E**Z n . 
Let r be a non-negative variable and let 

(4.9) £ = Max E(z — r\z > r) (r > 0) 

r 

and 

(4.10) {' = Min E(z + r\z + r < 0 ). (r > 0) 

r 

It is easy to see that 

(4.11) log A < E**Z n < log A + £ 
and 

(4.12) . log B + £' < E*Zn < log B . 

We obtain from (4.7), (4.11) and (4.12) 


(1 - 7) (log B + {') + 7 log A v (1 - 7) log B + 7 (log A + f) 

(4.13) - Ez - ~ En ~ - Ez - 

and if Ez > 0 

(1 — 7 ) log B + 7 (log A + S) < En < (1 - 7 ) (l°g #+£') + 7 log A 

(4.14) Ez ~ Ez 

if Ez < 0. 

4.2. Calculation of the quantities £ and £' for binomial and normal distributions . 
Let X be a random variable which can take only the values 0 and 1 . Let the 
probability that X = 1 be p* if Hi is true (i = 0, 1), and p if H is true. Denote 
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1 - p by q and 1 - pi by g, (i = 0, 1). Then /,(1) = pi , /,(0) = q t , /(1) = p 
and /(0) = q. It can be assumed without loss of generality that pi > Po • It 

is clear that log > 0 implies that x — \ and consequently log = log 
fo(x) fo(x) 

He,, “ 

(4.15) 


Po 


( = Max E(z — r I z •> r) = log —. 

r Po 


,/lM' 


Since log *^ 7 -t < 0 implies that x = 0, we have 

fo(X) 


(4.16) 


£' = Min E(z + r 1 2 + r < 0) = log 


Now we shall calculate the values £ and £' if X is normally distributed. Let 




U = o, i) (9, > e„) 


and 


/(*) = 


%/2-n- ' 


-(r-6) */2 


We may assume without loss of generality that 0 O = — A and 0i = A where 
A > 0, since this can always be achieved by a translation. Then 


(4.17) 




Denote ^?==- e i,ara by <$(;r) and 


J c * ** dt by 0(x). Let t = x — 6. 

Then z = 2A(£ + 0) and 

E(z — r\z — r > 0) = 2A E ^ + 0 — ^ 


(4.18) 

where 

(4.19) 




°) 


OA /*« 2A 

j (t - toW) dl = ^ l-UG(M + *0o)] 


k ~ 2A " e - 


HU) 


In section 3.5 (see equation (3.70)) it was proved that — £ 0 is a monotoni- 

b(fo) 

cally decreasing function of to . Hence the maximum of E(z — r\z — r > 0) 
is reached for r = 0 and consequently 

2A 


(4.20) 


e = 


<?(■ 
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Now we shall calculate We have 
£' =» Min E(z -fr|z+r<0) = 

(4 - 21) / 

= —2A Max E[ —x — ; 

r \ 2A 


-Max E(—z — r | — z — r >6) 


Let t = —2 + 0 and to = ^ + 0. Then 

e(-x - ~ -x - ~ > o) = E(t - to 1 t - h > 0) 

( 4 - 22 ) 

- ml. «- «*<»* - 

Since this is a monotonically decreasing function of to , we have 
(4.23) 

From (4.21) and (4.23) we obtain 


r m _ J 

Lew J‘ 


4.3. Saving in the number of observations as compared with the current test 
procedure. We consider the case of a normally distributed variate, such that 


^ = 75^'" 


(6l 7^ Oq). 


Denote by ft (a, 0) the minimum number of observations necessary in the current 
most powerful test for the probabilities of errors of the first and second kinds 
to be a and (3, respectively, or less. 

We shall calculate the number of observations required by the most powerful 
test. It can be assumed without loss of generality that do < 0i. According 
to the current most powerful test procedure the hypothesis Ho is accepted if 
x < d and the hypothesis Hi is accepted if x > d y where x is the arithmetic 
mean of the observations and d is a properly chosen constant. The probability 
of an error of the first kind is given by G[\/n(d — 0 O )] and the probability of an 
error of the second kind is given by 1 — G[y/n(d — 0i)] where G(t) = 

1 

—j c~ x2 ' 2 dx. To equate these probabilities to a and 0, respectively, the 


quantities d and n must satisfy 


G[Vn(d - 0 O )] = a 
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and 

(4.26) 1 - <?[v^(d - ft)] = fi. 

Denote by Ao and Ai the values for which G(X 0 ) = a and G( Ai) = 1-/9. Then 
we have 

(4.27) Vn(d - ft) = A 0 
and 

(4.28) Vn(d - ft) = Ai. 

Subtracting (4.27) from (4.28) we obtain 

(4.29) Vn(ft - ft) = A x - Ao. 

From (4.29) 

(4.30) »-»(«, A i*‘~ iff . 


If the expression on the right hand side of (4.30) is not an integer, n(a, 0 ) is the 
smallest integer in excess. 

In the sequential probability ratio test we put A = a(a, p) = - - - - ^ and 

d 

5 a= b(a, P) = :-. Then the probability of an error of the first (second) 

I — Of 

kind cannot exceed a(fi) except by a negligible amount. Let A (a, P) and 
B(a , P) be the values of A and B for which the probabilities of errors of the first 
and second kinds become exactly equal to a and (3, respectively. It has been 
shown in Section 3.2 that A (a, p) < a(qt } P) and B(a> (3) > &(ck, p). Thus, the 

expected values Ei(n) and E 0 (n) are only increased by putting A = a (a, p) and 

B = b (a, P) instead of A = A (a, p) and B = B (a, p ). 

Consider the case where | 9 X — 0 O | is small so that the quantities $ and £' can 

be neglected. Thus, we shall use the approximation (4.8). Since 7 = a if H = 
#0 and 7 = 1 — P if // = Hi , we obtain from (4.8) 


(4.31) 
and 

(4.32) 


£i(n) 


a* __ ^ a* + 
E x {z) P Ei(z) 


E 0 (n) 


- 6 * 

Eo(-z) 


-b* + <z* 
a Eo(~z) 


where a* = log a(a, P) = log 


-- i and b* = log b(a, p) 



a 


Since 



and 

(4.34) 
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E a (-z) - i(0o - 9i)\ 


• Jjjj /^\ Jjf /^\ 

it follows from (4.30), (4.31) and (4.32) that - 7 and ' V°— are independent 

n(a, 0) n(<x, 0) 

of the parameters 0 O and 0i. 


TABLE 1 


Average percentage saving of sequential analysis , as compared with current most 
powerful test for testing mean of a normally distributed variate 
A. When alternative hypothesis is true: 



.01 

.02 

, 

.03 

.04 

.05 

.01 

58 

60 

61 

62 

63 

.02 

54 

56 

57 

58 

59 

.03 

51 

53 

54 

55 

55 

.04 

49 

50 

51 

52 

53 

.05 

47 

49 

50 

50 

51 


B . When null hypothesis is true: 









a 







.01 

.02 

.03 

.04 

.05 







.01 

58 

54 

51 

49 

47 

.02 

60 

56 

53 

50 

49 

.03 

61 

57 

54 

51 

50 

.04 

62 

58 

55 

52 

50 

.05 

63 

59 

55 

53 

51 


The average saving of the sequential analysis as compared with the current 
method is 100 (l - ^j~j) per cent if Ih is true, and 100 ^1 - P 61 

X / Ei(n) \ . , 

cent if Ho is true. In Table 1 the expression 10011 - n ( a ^ ) w shown m Panel 

/ E 0 (n) \ 

A, and the expression 100 f 1 - n ^ a "^ ) in Panel B > for several values of a and (3. 

Because of the symmetry of the normal distribution, Panel B is obtained from 
Panel A simply by interchanging a and 0 . 
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As can be seen from the table, for the range of a and fi from *01 to .05 (the 
range most frequently employed), the sequential process leads to an average 
saving of at least 47 per cent in the necessary number of observations as com¬ 
pared with the current procedure. The true saving is slightly greater than shown 
in the table, since Ei(n ) calculated under the condition that A = a (a, /3) and 
B ~ b (a, 0 ) is greater than Ei(n) calculated under the condition that A «* A 
(a, 0 ) and B = B (a, 0 ). 

4.4. The characteristic f unction, the moments and the distribution of the number 
of observations necessary for reaching a decision. It was shown in [4] (see equa¬ 
tion (15) in [4]) that the following fundamental identity holds 

(4.35) E[e z »'[<p{ OP! « 1 WO - Ee'*) 

for all points t of the complex plane for which <p(i) exists and | <p(t) \ > 1 . The 
symbol n denotes the number of observations required by the sequential test, 
i.e., n is the smallest positive integer for which Z n is either > log A or < log J5, 
and <p{t) denotes the moment generating function of z. 

On the basis of the identity (4.35) the exact characteristic function of n is 
derived ift section 7 of [4] in the case when z can take only integral multiples of 
a constant. If the number of different values which Z n can take is large, the 
calculation of the exact characteristic function is cumbersome, because a large 
number of simultaneous linear equations have to be solved. However, if | Ez \ 
and a z are small so that | Z n — log A | (when Z n > log A) and | Z n — log B | 
(when Z n < log B) can be neglected, the calculation of the characteristic func¬ 
tion is much simpler, as was shown in [4]. We shall briefly state the results 
obtained in [4]. Let h be the real value 5 ^ 0 for which <p(h) = 1 . Furthermore 
let t — t\(r) and t — t>(r) be the roots of the equation in t 

—log tp(t) — T 

such that lim /i(r) = 0 and lim t 2 (r) = h. Finally, let \pi (r) the charactcr- 

t-O r-0 

istic function of the conditional distribution of n under the restriction that Z n > 
log A y and fa >(r) the characteristic function of the conditional distribution of n 
under the restriction that Z n < log B. Then, if | Z„ — log A | (when Z n > 
log A) and | Z n — log B | (when Z n < log B) can be neglected, ^i(r) and^ 2 (r) are 
the solutions of the linear equations 

(4.36) rh(r)A hM + (1 - y)Mr)B ,lW = 1 
and 

(4.37) 7+ (1 - y)Ur)B ,iM - 1 
where 

1 - B h 

7 = P(Z n > lOg A) = • 

The characteristic function of the unconditional distribution of n is 

(4.38) ^(r) = yfa (t) + (1 — y)fa(r). 
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As an illustration we shall determine ^i(r), ypt(r) and tp(r) when z has a normal 
distribution. Then we have 


Hence 

(4.39) 

(4.40) 


log - -mt -%t 2 


, 2 Ez 

k - ~ * 

t\{r) = ~2 (““Ez + vW^^Tr), 
h(r) = 4 {-Et - V(£z) 2 - 2<r' t). 


From (4.36), (4.37) and (4.38) we obtain 

_ £*1 


(4.41) 

(4.42) 
and 

(4.43) 
where 

(4.44) 
and 

(4.45) 


y\h(r) 


(1 — y)h(r) = 


A 01 B 0i - A°*B 01 ’ 


\^(r) 


A ®*#® 2 _ ^*2 £01 

-4 01 + B 02 - A* 2 - B 0X 
A 01 B°* - 


<7i = 2 (-& + \/(Fz) 2 — 2<r* r) 


02 


"I (-Ez - V(fe) 2 - 2a 2 r). 


For any positive integer r the r-th moment of n i.e.j E{n) is equal to the r-th 
derivative of \[/(t) taken at r = 0. Let E*(n) be the conditional expected value 
of n under the restriction that Z n < log J5, and let E**(n r ) be the conditional 
expected value of n under the restriction that Z n > log A . Then 


(4.46) 


E*(n) = 


d r j 2 (t) 
dr 


and E**{n r ) = 


d r \h(r) 
dr 


jr f / \ 

It may be of interest to note that * 7 (k — 1,2) and therefore also the 

dr r-o 

moments of n can be obtained from the identity (4.35) directly by successive 
differentiation. In fact, the identity (4.35) can be written as (neglecting the 
excess of Z„ over the boundaries log A and log B) 

(4.47) 7Aty][-logv>(0] + (1 - y)B t H-logi P (t)] = 1. 
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Taking the first r derivatives of (4.47) with respect to t at t ** 0 and t 
we obtain a system of 2 r linear equations in the 2 r unknowns (A; 


dr* 

1 , • • • , r) from which these unknowns can be determined. For example, 
(k = 1 , 2) can be determined as follows: Taking the first derivative 


1,2 ;j 
dfk{r) | 

dr r-o 

JT f ( \ 

of (4.47) with respect to t and denoting 3 *— by ^^(r) we obtain 

ar r 


7 (log A)AVi[“log «?(<)] - 7-4 ‘ yjy ^l^t-log <p(f)] 


(4.48) 


+ (1 - 7 )(log log ¥>(<)] 

- (1 - 7 )B' ^ ti l> l~log v(t)l 


Putting t = 0 and t = h we obtain the equations 
(4.49) 7 log A - y 4>{ 


<n (0) + (1 - 7 ) log B - (1 - 7 ) ^ ^ l, (0) = 0 


and 


(4.50) 


7 (log A)A h - yA 


h <p'(h) 


<p{h) 

+ (1 - 7 )(log B)B k - (1 


7 )B h ~~ ^ l> (0) 


0 


from which fi V (0) and ^.’’(O) can be determined. 

The distribution of n can be obtained by inverting the characteristic function 
of yp(r). This was done in [4] (neglecting the excess of Z n over log A and log B) 
in the case when z is normally distributed. The results obtained in [4] can be 
briefly stated as follows: If B = 0, or if B > 0 and A = 00 , the distribution 
of n is a simple elementary function. If B = 0 and Ez > 0, the distribution of 

m = “ 2 (Ez) 2 n is given by 


(4.51) F(m) dm = ~^ *-*/«—* (0 < m < °o) 

where 

(4.52) c = *4 {Ez) log A. 

O’* 

If B > 0, A = oo and Ez <0 the distribution of m = (ife) 2 n is given by the 

&(j z 

expression we obtain from (4.51) if we substitute ■"* (Ez) log B for c. 

v * 

If B > 0 and A < oo, the distribution of m is given by an infinite series where 
each term is of the form (4.51) (see equation (76) in [4]). 
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Since m is a discrete variable, it may seem paradoxical that we obtained a 
probability density function for m. However, the explanation lies in the feet 
that we neglected the excess of Z„ over log A and log B which is zero only in the 
limiting case when Ez and <r t approach zero. 

The distribution of m given in (4.51) can be used as a good approximation 
to the exact distribution of m even if B > 0, provided that the probability that 
Z n > log A is nearly equal to 1 . 

It was pointed out in [4] that if | Ez | and <r g are sufficiently small, the distribu¬ 
tion of n determined under the assumption that z is normally distributed will 
be a good approximation to the exact distribution of n even if z is not normally 
distributed. 

4,5. Lower limit of the probability that the sequential process will terminate with 
a number of trials less than or equal to a given number. Let P*(no) be the prob¬ 
ability that the sequential process will terminate at a value n < n 0 , calculated 
under Hi (i = 0 , 1 ). Let 


Po(n o) = P„ pj; < log b] 
Fa(«o) = Pi E Z a > lOg A j . 


It is clear that 


P.-(no) < P<(n 0 ) 


a = o, i). 


For calculating 7 \(n 0 ) we shall assume that n 0 is sufficiently large so that ^ z„ 

0 — 1 

can be regarded as normally distributed. Let G(X) be defined by 


G(X) = V5r l 


e~ i,s dt. 


Furthermore, let 


_ log A — noEijz) 
\/no 0 \(z) 


(4.58) x.(«.) - !a B ~ 

* VnoCToiz ) 

where <n(z) is the standard deviation of z under Hi . Then 

(4.59) P.(n 0 ) = G[Xi(«o)] 
and 

(4.60) Fo(no) = 1 — (?[XoOio)]. 

Hence we have the inequalities 

(4.61) Pi(n 0 ) > GlUno)] 
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and 

(4.62) P 0 (n 0 ) >1- G[\ 0 (n 0 )]. 

1—8 8 

Putting log A — log £ and log B = log “ , Table 2 shows the values 

of Pi (no) and P 0 (n 0 ) corresponding to different pairs (a, 0) and different values 
of no. In these calculations it has been assumed that the distribution under 
#o is a normal distribution with mean zero and unit variance, and the distribution 
under Hi is a normal distribution with mean 6 and unit variance. For each pair 
(a, 8) the value of 6 was determined so that the number of observations required 
by the current most powerful test of strength (a, 8) is equal to 1000. 

TABLE 2 


Lower bound of the probability* that a sequential analysis will terminate within 
various numbers of trials , when the most powerful current 
test requires exactly 1000 trials 


Number of 
trials 

■ 

a “ .01 and j8 * .01 

a *■ .01 and /3 « .05 

a =* .05 and 3 ** .05 

Alternative 

hypothesis 

true 

Null 

hypothesis 

true 

Alternative 

hypothesis 

true 

Null 

hypothesis 

true 

Alternative 

hypothesis 

true 

Null 

hypothesis 

true 

1000 

.910 

.910 

.799 

.891 

.773 

.773 

1200 

.950 

.950 

.871 

.932 

.837 

.837 

1400 

.972 

.972 

.916 

.957 

.883 

.883 

1600 

.985 

.985 

.946 ; 

.972 

.915 

.915 

1800 

.991 

.991 

.965 | 

.982 

.938 

.938 

2000 

.995 

.995 

.977 

.989 

.955 

.955 

2200 

.997 

.997 

.985 

.993 

.967 

.967 

2400 

.999 

.999 

.990 

.995 

.976 

.976 

2600 

.999 

.999 

.994 

.997 

.982 

.982 

2800 

1.00 

1.00 

.996 

.998 

.987 

.987 

3000 

1.00 

1.00 

.997 

.999 

.990 

.990 


* The probabilities given are lower bounds for the true probabilities. They 
relate to a test of the mean of a normally distributed variate, the difference be¬ 
tween the null and alternative hypothesis being adjusted for each pair of values 
of a and 8 so that the number of trials required under the most powerful current 
test is exactly 1000 . 


4.6. Truncated sequential analysis . In some applications a definite upper 
bound for the number of observations may be desirable. Thus, a certain 
integer n 0 is chosen so that if the sequential process does not lead to a final 
decision for n < w 0 , a new rule Ls given for the acceptance or rejection of Hq 
at the stage n = n 0 . 

A simple and reasonable rule for the acceptance or rejection of Ho at the stage 

no nn 

n * n 0 can be given as follows: If 22 z <* ^ 0 we accept Ho and if > 0 
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we accept H 1 . By thus truncating the sequential process we change, however, 
the probabilities of errors of the first and second kinds. Let a and $ be the 
probabilities of errors of the first and second kinds, respectively, if the sequential 
test is not truncated. Let a(n 0 ) and /3(n 0 ) be the probabilities of emu's of the 
first and second kinds if the test is truncated at n — n 0 . We shall derive upper 
bounds for cr(no) and 0(n o ). 

First we shall derive an upper bound for a(n 0 ). Let p 0 (n 0 ) be the probability 
(under the null hypothesis) that the following three conditions are simultaneously 
fulfilled'; 

n 

(i) log B < 2 z « < log ri for n = 1, • • •, n 0 — 1 

a— 1 

(ii) 0<f)z«<logA 

a —l 

(iii) continuing the sequential process beyond n 0 , it terminates with the 
acceptance of Ho . 

It is clear that 

(4.63) tt(wo) < a + po(wo). 

Let p 0 (n 0 ) be the probability (under the null hypothesis) that 0 < z a < 

a>l 

log A . Then obviously 

Po(tto) < po(no) 

and consequently 

(4.64) <x(no) a. + po(fto). 

Let pi(n 0 ) be the probability under the alternative hypothesis that the fol¬ 
lowing three conditions are simultaneously fulfilled: 

(i) log B < X) < log -d for n = 1, • • •, n<> — 1 


«o 

(ii) log B < ^ Za < o 

(iii) continuing the sequential process beyond n 0 , it terminates with the 
acceptance of Hi . 

It is clear that 

(4.65) 0(no) < 0 + pi(n 0 ). 

Let pi(n 0 ) be the probability (under the alternative hypothesis) that log B < 
z a < 0. Then pi(n 0 ) < Pi(n 0 ) and consequently 

a-»l 


(4.66) 


d( n o) < /9 + pi(n 0 ). 
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Let 

_ -Tfrflo(g) 

Pl V^o *0(2) 
log A — npEojz) —npEijz) 

Vi Vno<r 0 (z) * V8 \/nc(r l (z) 9 n 

where a ,( 2 ) is the^standard deviation of z under Hi (i 
(4.67) po (n 0 ) « GW - G(v 2 ) 

and 


log B — npEijz) 

*s/no<ri(«) 

0, 1). Then 


(4.68) Pi(n 0 ) — G( va) — GW- 
From (4.64), (4.66), (4.67) and (4.68) we obtain 

(4.69) a(tto) < a + G(vi) — (r(i*) 
and 

(4.70) j8(no) < 0 + GW - OW. 

The upper bounds given in (4.69) and (4.70) may considerably exceed a(n 0 ) 
and j8(n 0 ), respectively. It would be desirable to find closer limits. 

Table 3 shows the values of the upper bounds of a(n 0 ) and (3(no) given by for¬ 
mulas (4.69) and (4.70) corresponding to different pairs («, fi) and different values 

1 — 3 (3 

of n 0 . In these calculations we have put log A = log-log B = log :- 

a 1 — ct 

and assumed that the distribution under Ho is a normal distribution with mean 
zero and unit variance, and the distribution under H Y is a normal distribution 
with mean 6 and unit variance. For each pair (a, (3) the value of 6 has been 
determined so that the number of observations required by the current most 
powerful test of strength (a, 0) is equal to 1000. 

It seems to the author that the upper limits given in (4.69) and (4.70) are 
considerably above the true a(n 0 ) and 1 9(n 0 ) respectively, when w 0 is not much 
higher than the value of n needed for the current most powerful test. 

4.7. Efficiency of the sequential probability ratio test. Let S be any sequen¬ 
tial test for which the probability of an error of the first kind is a, the prob¬ 
ability of an error of the second kind is 0 and the probability that the test 
procedure will eventually terminate is one. Let S' be the sequential prob¬ 
ability ratio test whose strength is equal to that of *8. We shall prove that the 
sequential probability ratio test is an optimum test, i.e., that Ei{n | S) > 
Ei(n | S') (i = 0, 1), if for S' the excess of Z„ over log A and log B can be neg¬ 
lected. This excess is exactly zero if z can take only the values d and*.—d 
and if log A and log B are integral multiples of d. In any other case the excess 
will not be identically zero. However, if \Ez\ and a, are sufficiently small, 
the excess of Z n over log A and log B is negligible. 

For any random variable u we shall denote by E* (u | S) the conditional 
expected value of u under the hypothesis Hi (i = 0, 1) and under the restriction 
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that Ha is accepted. Similarly, let E**(u | S) be the conditional expected value 
of u under the hypothesis Hi (i = 0, 1) and under the restriction that Hi is 
accepted. In the notations for these expected values the symbol S stands for 

TABLE 3 


Effect on risks of error of truncating* a sequential analysis at a predetermined 

number of trials 


Number of 
trials 

a ™ .01 and 0 *- .01 

a ** .01 and 0 ■» .05 

a ** .05 and 0 — .05 

Upper 
bound of 
effective 
a 

Upper 
bound of 
effective 

0 

Upper 
bound of 
effective 

a 

Upper 
bound of 
effective 

0 

Upper 
bound of 
effective 

a 

Upper 
bound of 
effective 

0 

1000 

.020 

.020 

.033 

.070 

.095 

.095 

1200 

.015 

.015 

.024 

.063 

.082 

.082 

1400 

.013 

.013 

.019 

.058 

.072 

.072 

1600 

.012 

.012 

.016 

.055 

.066 

.066 

1800 

.011 

.011 

.014 

.053 

.062 

.062 

2000 

.010 

.010 

.012 

.052 

.058 

.058 

2200 

.010 

.010 

.012 

.051 

.056 

.056 

2400 

.010 

.010 

.011 

.051 

.055 

.055 

2600 

.010 

.010 

.011 

.051 

.053 

.053 

2800 

.010 

.010 

.010 

.050 

.053 

.053 

3000 

.010 

.010 

.010 

.050 

.052 

.052 


* If the sequential analysis is based on the values a and (3 shown, but a deci¬ 
sion is made at n 0 trials even when the normal sequential criteria w r ould require 
a continuation of the process, the realized values of a and /3 will not exceed the 
tabular entries. The table relates to a test of the mean of a normally distributed 
variate, the difference between the null and alternative hypotheses being ad¬ 
justed for each pair («,£) so that the number of trials required by the current 
test is 1000. 

the sequential test used. Denote by Q%{S) the totality of all samples for which 
the test S leads to the acceptance of //*. Then we have 


(4.71) 

E* 

fe ls ) ■ 

PxlQoOS)] _ 

" Po[Qo(S)} 1 

0 

— a 

(4.72) 

Eo* 

(£'*)■ 

PiKMS)] l 
" Po[Qm 

- 0 

a 

(4.73) 

El 

/po, lo \_Po[Qo(S)] _ 1 

W V na*m 

— a 

0 

and 





(4.74) 

El* 

fe is ) 

PolQiOS)] _ 
PAQi(S)] 

a 
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To prove the efficiency of the sequential probability ratio test, we shall first 
derive two lemmas. 

Lemma 1. For any random variable u the inequality 

(4.75) < Ee u 
* holds . 

Proof: Inequality (4.75) can be written as 

(4.76) 1 < Ee u ' 

where u' * u — Eu. Lemma 1 is proved if we show that (4.76) holds for any 
random variable u' with zero mean. Expanding e u ' in a Taylor series around 
u f = 0, we obtain 

(4.77) e“' = 1 + «' + where 0 < {(«') < «'• 

Hence 

(4.78) Ee u ‘ = 1 + > 1 


and Lemma 1 is proved. 

Lemma 2. Let She a sequential test such that there exists a finite integer N with 
the property that the number n of observations required for the test is < IV. Then 

(4.70, E( „| S) - -fcfl*) (< ‘ °' [) - 


The proof is omitted, since it is essentially the same as that of equation (4.5) 
for the sequential probability ratio test. 

On the basis of Lemmas 1 and 2 we shall be able to derive the following 
theorem. 

Theorem. Let S be any sequential test for which the probability of an error 
of the first kind is a, the probability of an error of the second kind is and the prob¬ 
ability that the test procedure will eventually terminate is equal to one. Then 

«•*»> «“ i ® a I® [« - «> >°« rh + ■ 108 Hr 5 ] 

and 


(4.81) EM I S) > [/? log ^ + (1 - 0) log . 

Proof: First we shall prove the theorem in the case when there exists a finite 
integer N such that n never exceeds N. According to Lemma 2 we have 


EM | S) 

(4.82) 
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18 ? 


and 


(4.83) 




s)[^( l08 sil 5 ) +(, -« Br ’( to «£l^- 


Exit) 


From equations (4.7l)-(4.74) and Lemma 1 we obtain the inequalities 


(4.84) 

(4.85) 

(4.86) 
and 

(4.87) 


Et 


( Iog sk ) SI ° 8 

E r(iog^\s) < log 1 ~ 

\ POn I / OL 


d 

1 — a 
- 0 


E r( l08 &| s )__ £r (, 0e a ; | s )< i 0g i ■> 

**("* £l s ) - -*‘( 1 '*£! s ) s “* r=i- 


Since E 0 (z) < 0, (4.80) follows from (4.82), (4.84) and (4.85). Similarly, since 
Ei(z) > 0, (4.81) follows from (4.83), (4.86) and (4.87). This proves the theo¬ 
rem when there exists a finite integer N such that n < N. 

To prove the theorem for any sequential test S of strength (a, /3), for any 
positive integer N let S# be the sequential test we obtain by truncating S at the 
N-th observation if no decision is reached before the AT-th observation. Let 
(a N , (3 s) be the strength of Ss • Then we have 

(4.88) EM | S) > EM 1 S.v) > bl - «v) log log 3— 

and 


(4.89) EM I S) > Exin \ S N ) > -~ 



0.v 

1 — ay 


+ (1 - (3y) log 


1 - ftr 
ay 


Since lim a N = a and lim (3s — 0, inequalities (4.80) and (4.81) follow from 

iV—oC 

(4.88) and (4.89). Hence the proof of the theorem is completed. 

If for the sequential probability ratio test S' the excess of the cumulative sum 
Z n over the boundaries log A and log B is zero, Eo(n | S') is exactly equal to the 
right hand side member of (4.80) and Ei(n | S') is exactly equal to the right hand 
side member of (4.81). Hence, in this case S' is exactly an optimum test. 
If both | Ez | and <j z are small, also the expected value of the excess over the 
boundaries will be small and, therefore, E 0 {n | S') and Ei(n | S') mil be only 
slightly larger than the right hand members of (4.80) and (4.81), respectively. 
Thus, in such a case the sequential probability ratio test is, if not exactly, very 
nearly an optimum test. 12 


„ » The author conjectures that the sequential probability ratio test is exactly an opti¬ 
mum test even if the excess of Z n over the boundaries is not 2 ero. However, he did not 
succeed in proving this. 
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Part II. Sequential Test of a Simple or Composite Hypothesis Against 

a Set of Alternatives 

In Part I we have dealt with the problem of testing a simple hypothesis Ho 
against a single alternative Hi . Here we shall consider the problem of testing 
a simple or composite hypothesis against a set of infinitely many alternatives. 
By a simple hypothesis we mean a hypothesis which specifies uniquely the 
probability distribution of the random variable x under consideration. A 
hypothesis is called composite, if it is not simple. 

5. Test of a Simple Hypothesis Against One-sided Alternatives 

5.1. General remarks. Let /(x, 0) be the probability density function of a 
random variable X, where 0 is an unknown parameter. Suppose that it is re¬ 
quired to test the simple hypothesis that 0 = 0 O and that the alternative values 
of 0 are restricted to values 0 > 0 O . Assume that it is desired to have a sequen¬ 
tial test such that the probability of an error of the first kind is equal to a given a. 

The probability of an error of the second kind is no longer a single value, but 
is a function of the true value of 0. If f(x, 0) is a continuous function of x and 
0, the probability of an error of the second kind will be arbitrarily near 1 — a 
if the true value of 0 is sufficiently near 0 O . Hence, if a is small, the prob¬ 
ability of an error of the second kind is necessarily large when the true value of 0 
is very near 0 O . In most practical applications we do not care if the prob¬ 
ability of an error of the second kind is high when the true value of 0 is very 
near 0 O , since in this case the error committed by accepting 0 O is usually of very 
little importance. However, there will be a value 0i > 0 O such that we wish the 
probability of an error of the second kind to be less than or equal to a given small 
positive value /3 whenever the true value of 0 is greater than or equal to 0i. 

In this case we can proceed as follows: Consider the single alternative hypothe¬ 
sis Hi that 0 = 0i . Construct a sequential test for testing 0 = 0o against the 
single alternative Hi such that the probability of an error of the first kind is a 
and the probability of an error of the second kind, i.e., the probability of ac¬ 
cepting 0 O when 0i is true, is If this sequential test has the further property 
that the probability of an error of the second kind is less than or equal to $ 
whenever the true value of 0 is greater than 0i, then this sequential test pro¬ 
vides a satisfactory solution of the problem of testing the hypothesis that 0 = 0o 
against the set of alternatives 0 > 0 O . 

In most of the important cases occurring in practice, such as when X has a 
normal, binomial, or Poisson distribution, etc., the sequential probability ratio 
test for testing the hypothesis that 0 = 0 O against a single alternative 0i (0i > 0o) 
satisfies the condition that the probability of an error of the second kind is a 
monotonically decreasing function of 0 in the domain 0 > 0 O . Thus, in all these 
cases the sequential probability ratio test for testing the hypothesis that 0 = 0o 
against a properly chosen alternative 0i provides a satisfactory solution of our 
problem. 
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The case in which the alternative values of 6 are restricted to values less than 
0o is entirely analogous to that in which the alternatives are restricted to values 
greater than 0 O , and need not be discussed separately. 

It should be pointed out that the test procedure for testing 0 — 0o against 
alternatives 0 > 0 O , as described in this section, is also suitable for testing the 
composite hypothesis that 0 < 0 O , provided that the probability of rejecting 
the null hypothesis is < a whenever the true value of 0 is < 0 O . This condi¬ 
tion is fulfilled, for instance, when X has a normal, binomial or Poisson distribu¬ 
tion. 

5.2. Application to binomial distributions. 5.2.1. Statement of the problem. 
The case of a binomial distribution arises when the result of a single observeir 
tion is a classification into one of two categories. For example, this is the 
situation in acceptance inspection of manufactured products, if each unit 
inspected is classified into one of the two categories, non-defective and defective. 
Let p denote the probability that an item belongs to a given category. The 
value of p is usually unknown. We shall deal here with the problem of testing 
the hypothesis that p does not exceed a given value p' against the alternative 
possibility that p > p\ 

Since acceptance inspection of manufactured products is perhaps the most 
important and widest field of application of such a test procedure, we shall, in 
continuing the discussion, use the terminology of acceptance inspection. This, 
of course, does not mean that the test procedure is not applicable to other 
cases. Suppose that a lot containing a large number of units is submitted for 
sampling inspection. Let p denote the proportion of defective units contained 
in the lot. The probability that a unit drawn at random from the lot will be 
defective is equal to p. If m units are drawn at random from the lot, the prob¬ 
ability that there will l>e d defectives among them is given by ls 


(5.1) 


ml 

d\{m - d)\ 


v d ( 1 - v) m ~ 4 


(d = 0 , 1 , ••• ,m). 


The probability distribution as given in (5.1) is called a binomial distribution. 

The purpose of sampling inspection is to decide whether the lot should be 
accepted or rejected. It is clear that for high values of p we want to reject the 
lot and for low values of p we want to accept the lot. Thus, it will be possible 
to specify a particular value of p, say p\ so that if p < p f we wish to accept the 
lot, and if p > p f we wish to reject the lot. Thus, our problem is to devise a 
proper sampling inspection plan for testing the hypothesis that p < p'. 

5.2.2. Tolerated risks for making a wrong decision. No sampling inspection 
plan can guarantee that the correct decision will always be made, i.e., that the 
lot will always be accepted when p < p f and the lot will always be rejected when 
p > p ', unless the lot is inspected completely. A complete inspection is usually 

1J Formula (5.1) is exact only if the lot contains infinitely many units. While the lot is 
always finite in practice, we shall assume that m is small as compared with the lot size so 
that formula (5.1) can be used. 
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rather uneconomical and one is willing to take some risk of making a wrong 
decision if this permits a reduction in the amount of inspection. Hence, recom¬ 
mendations as to the proper choice of a sampling inspection plan can be made 
only after the risks that can be tolerated have been stated. 

If p is equal to the marginal value p\ we may say that it is indifferent to us 
whether the lot is accepted or rejected. If p < pi we prefer acceptance and 
this preference is the stronger the smaller p. Similarly, if p > p' we prefer 
rejection of the lot and this preference increases as p increases. Thus, it will 
be possible to select a value po < p f and a value p\ > p f such that the error is 
considered serious only if we accept the lot when p > p \, or we reject the lot 
when p < po . 

After the two values p 0 and pi have been selected the risks that we are willing 
to tolerate may reasonably be stated as follows: a sampling inspection plan is 
required such that the probability of rejecting the lot is less than or equal to a 
preassigned value a whenever p < p 0 , and the probability of accepting the lot 
is less than or equal to a preassigned value 0 whenever p > p\ . Thus, the 
tolerated risks are characterized by the four quantities p 0 , pi, a and 0. The 
proper sampling plan can be determined after these four quantities have been 
chosen. 

5.2.3. The sequential probability ratio test corresponding to the quantities po, 
Pi , a and 0. Let Ho be the hypothesis that p = po and Hi the hypothesis that 
jp *= Pi . Consider the sequential probability ratio test T for testing H 0 against 
Hi for which a is the probability of accepting Hi when H 0 is true (error of the 
first kind) and 0 is the probability of accepting H 0 when Hi is true (error of the 
second kind). This probability ratio test will satisfy all our requirements, since 
for this test the probability of accepting the lot (accepting H 0 ) is <0 whenever 
p > Pi and the probability of rejecting the lot (accepting Hi) is <a whenever 
p < Po- 

According to formulas (3.8), (3.9), (3.10) and section 3.3 the sequential test 
T is given as follows: At each stage of the inspection, at the ra-th observation 
for each integral value of m, calculate the quantity 


(5.2) 


pun _ pfr(l - PiT^ 
P*m J)j-( 1 - po)”-*" 


(m = 1,2, ■■■) 


where dm denotes the number of defectives found in the first m units inspected. 
Reject the lot (accept Hi) if 


(5.3) 


Pun > 1 - <3 
POm OC 


Accept the lot if 
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Take an additional observation if 14 . 


1 - 0 


1 — a 


For the purpose of practical computations it is useful to rewrite the inequalities 
(5.3), (5.4) and (5.5) in a somewhat different form. Taking the logarithms of 
both sides of the inequalities (5.3), (5,4) and (5.5) one can easily verify that 
these inequalities are equivalent to 


1-/5 


l° g - lo g t 
Po 1 


1 - Pi _ 

1 1 — Pi 

log ,—~ 

1 — Po 


logE-logl^E 

Po 1 — Po 


log Hi — log 


1 - Po 
1 - Pi_ 

- u.1 - P» 


log Hi - log j-~ 

Po 1 — Po 


1 — Po 
1 - Pi 
, 1 - 


log Hi - log 


< d m < 


1 - fi 


log 21 - log j-® 

Po 1 ~ Po 


log Hi _ Jog 


1 - Po 
1 - Pi 

i 1 — Pi 

- log,- r 

1 — Po 


Using the inequalities (5.6), (5.7) and (5.8) the test procedure can easily be 
carried out as follows: For each m we compute the acceptance number 


log ^ - log \ -- 

Po 1 — Po 


log — - log 


1 ~ Po 
1 — Pi 

i 1 — Pi 
' log 1—r 

1 — po 


and the rejection number 


log 1 -- log - _ ° 

(5.10) R n = - a - + »-f 1 — 

log Hi - log \ -log Hi - log j-2l 

Po 1— Po Po 1 — Po 

1 < There is a slight approximation involved in the formulas (5.3), (5.4) and (5.5). For 
details see section 3.3. 
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These acceptance numbers A m and rejection numbers R m are best tabulated 
before inspection starts. Inspection is continued as long as A m < d m < R m . 
At the first time when d m does not lie between the acceptance and rejection 
numbers, the sampling inspection is terminated. The lot is accepted if d m < A m 
and the lot is rejected if d m > Rm . 

The test procedure can also be carried out graphically as indicated in Figure 2. 
The number m of observations made is measured along the abscissa axis. Since 
A w is a linear function of m, the points (m, Am) will lie on a straight line Lo. 
Similarly, the points (m, R m ) will lie on a straight line Li . We draw the lines 
Lo and L\ and the points (ra, d m ) are plotted as inspection goes on. At the first 
time when the point (m, d m ) does not lie between the lines L 0 and Li inspection 



is terminated. The lot is rejected if the point (m, d m ) lies on L x or above, and the 
lot is accepted if the point (m, d m ) lies on L 0 or below. 

5.2.4. The operating characteristic curve of the test. As mentioned in section 
5.2.3 the test procedure defined by the inequalities (5.6), (5.7) and (5.8) will 
satisfy the requirement that the probability of accepting the lot is < 0 when¬ 
ever p > p\ and the probability of rejecting the lot is <a whenever p < p 0 . 
Although this already describes the essential features of the test procedure, it 
may be desirable to know the probability L p of accepting the lot for any possible 
value p of the proportion of defectives in the lot. Clearly, L p will be a function 
of p and can be plotted as shown in Figure 3. The curve L p is called the operat¬ 
ing characteristic curve. The range of p is, of course, from 0 to 1. L p = 1 
for p = 0 and L p = 0 for p = 1. The value of L p decreases as p increases. 
We already know that L P0 = 1 — a and L Pl == 0. Now we shall give a method 
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for computing the value of L p for any p. If px is not far from po, which will 
usually be the case in practice, a good approximation to L p is given by {see 
equation 3.35) 


(5.11) 


1 - 


- (r=~J - 




where h is equal to the non-zero root of the equation 

B..2) + 



To plot the operating characteristic curve, it is not necessary to solve (5.12) 
with respect to h. Instead we can proceed as follows: From (5.12) we express 
p as a function of h, i.e., 


(5.13) 



For any given value h we compute the value of p from (5.13) and the value of 
L P from (5.11). The point (p, L p ) obtained in this way will be a point of the 
operating characteristic curve. Doing this for various values of h we can 
obtain a sufficient number of points on the operating characteristic curve so 
that the curve can be drawn. 
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6.2.5. The average amount of inspection required by the test. Denote by E p (n) 
the expected value of the number of observations required by the test. Clearly, 
E p (n) is a function of p. According to (4.8) a good approximation to the value 
of E p (n) is given by 


(6.14) 


L p log —— + (1 - L„) log -- 

E P (n) ~ -—-^-— 

p log~* + (1 - p) log - - 

Po 1 — Po 


where L p is given by (5.11). Plotting E p (n) as a function of p, the curve obtained 
will, in general, be of the type shown in Fig. 4. The maximum will ordinarily 
be reached between po and pi . Furthermore, the curve will, in general, be 
increasing as p increases from 0 to po , and decreasing as p increases from p\ 
to 1. 



5.3. Sequential analysis of double dichotomies. 5.3.1. Formulation of the 
problem . Suppose that we want to compare the effectiveness of t^vo production 
processes where the effectiveness of a production process is measured in terms 
of the proportion of effective units in the sequence produced. We shall say that 
a unit is effective if it has a certain desirable property, for example, if it with¬ 
stands a certain strain. Let p\ be the proportion of effectives if process 1 is 
used, and p 2 the proportion of effectives if process 2 is used. In other words, 
Pi is the probability that a unit produced will be effective if process 1 is used, 
and p 2 is the probability that a unit produced will be effective if process 2 is 
used. Suppose that the manufacturer does not know the values of p\ and p 2 , 
and that process 1 is in operation. If p\ > p 2 , then the manufacturer wants to 
retain process 1. However, if pi < p 2 , especially if pi is substantially smaller 
than P 2 , the manufacturer would like to replace process 1 by process 2. Thus, 
we are interested in testing the hypothesis that pi > p 2 against the alternative 
that pi < Pi- 

A more general formulation of the problem can be given as follows: Consider 
two binomial distributions. Let pi be the probability of a success in a single 
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trial according to the first binomial distribution, and let p* be the probability 
of a success in a single trial according to the second binomial distribution. 
We shall use the symbol 1 for success and the symbol 0 for failure. Suppose 
that the probabilities pi and p 2 are unknown. We consider the. problem of test¬ 
ing the hypothesis that pi > p 2 on the basis of a sample consisting of N\ observa¬ 
tions from the first binomial distribution and N 2 observations from the second 
binomial population. Since in many experiments the case iSTi = i\T 2 is mainly 
of interest, and since this case (as we shall see later) makes an exact and sim¬ 
plified mathematical treatment of the problem possible, we shall assume in what 
follows that Ni = N 2 = N (say). 


Thus, on the basis of the outcome of the two series of N independent trials 
we have to decide whether the hypothesis pi> p 2 should be accepted or rejected. 

5.3.2. The classical method. The classical solution of the problem for large N 
is given as follows: Let aSi be the number of successes in the first set of N trials 
(drawn from the first binomial population), and let S 2 be the number of suc¬ 
cesses in the second set of N trials (drawn from the second binomial population). 


Denote by p and 1 


p by q. Then for large N the expression 


(5.15) 


& - St 
y/2Npq 


is normally distributed with zero mean and unit variance if pi = p 2 . Suppose 
that the level of significance we wish to choose is a. Let X a be the value for 
which the probability that a normal variate with zero mean and unit variance 
will exceed X« is equal to a. (For example, if a = .05, X« = 1.64). Thus, if 
Pi = p 2 , the probability that the expression (5.15) will exceed X a is equal to a. 
If pi > pz , the probability that the expression (5.15) will exceed X« is less than a . 
According to the classical method the hypothesis that pi > p 2 is rejected if the 
observed value of (5.15) exceeds X« . This method involves an approximation. 
The distribution of the expression (5.15) is not exactly normal even for large N. 
For small N this method cannot be used, since the distribution of (5.15) is far 
from normal. For small N, R. A. Fisher has proposed an exact method which, 
however, involves cumbersome calculations. In section 5.3.3. we shall suggest 
another method which is exact (does not involve any approximations) and is 
simple to apply as far as computations are concerned. The latter method has 
the further advantage of being suitable for sequential analysis to which existing 
methods are not readily adaptable. 

5.3.3. An exact method. Let cq, • • • , a N be the results in the first set of N 
trials, and hi , * • • , by the results in the second set of N trials. These results are 
arranged in the order observed. Consider the sequence of N pairs 


(5.16) 


(®i > hi), ' • • , (art , bar). 


Let ti be the number of pairs (1, 0) and t 2 the number of pairs (0, 1) in this 
sequence. We consider only the pairs (0,1) and (1,0) and base the test on them. 
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Let a be the outcome of an observation from the first population, and b the 
outeome of an observation from the second population. The probability that 
(a, b) = (1, 0) is equal to pi(l — p 2 ), and the probability that (a, b) = (0, 1) is 
equal to (1 — pi)p 2 . Hence, knowing that (a, b) is equal to one of the pairs 
(0,1) and (1,0), the (conditional) probability that it is equal to (0,1) is given by 


(5.17) 


= (1 ~ Pl)P2 

V p l ( 1 - pj) + ps(l ~ Pi) ’ 


and the (conditional) probability that it is equal to (1, 0) is given by 


(5.18) 


p Pi(l - Vi) + (1 - Pi)Pi' 


Hence, considering only the pairs (1,0) and (0,1) the variate t 2 is distributed like 
the number of successes in a sequence of t = ti + h independent trials, the prob¬ 
ability of a success in a single trial being equal to p. One can easily verify that 
p = £ if pi = p 2 , p < h if Pi > P 2 and p > i if p x < p 2 . Thus, the hypothesis 
to be tested, i.e., the hypothesis that pi > p 2 , is equivalent to the hypothesis 
that p < £. Thus, we can test the hypothesis that pi > p 2 by testing the 
hypothesis that p < J on the basis of the observ ed value of t 2 . Since the dis¬ 
tribution of t 2 is the same as the distribution of the number of successes in t = U + 
t 2 independent trials (t is treated as a constant and the probability of a success 
in a single trial is equal to p), the test procedure can be carried out in the usual 
manner. If we want a level of significance a , a critical value T is chosen so that 
for p = \ the probability that t 2 > T is equal to a. The hypothesis that p < J 
is rejected if and only if the observed U is greater than or equal to the critical 
value T . The value of T can be obtained from a table of the binomial distribu¬ 
tion. If t is large, t 2 is nearly normally distributed and the critical value T can 
be obtained from a table of the normal distribution. 

This procedure thus provides a simple test of the hypothesis that Pi > p 2 . 
The question .arises whether the efficiency of this method is as high as that of the 
classical method. It would seem that the method suggested here cannot be a 
most efficient procedure, since the values of ti and U depend on the order of the 
elements in the sequences (ai, • • * , a N ) and (&i, • • * , b N ) 1 and there is no 
particular reason to arrange them in the order observed. However, it has been 
shown in [7] that the loss in efficiency as compared with the classical method is 
negligible if the number N of trials is large. 15 

It should be pointed out that the procedure for testing the hypothesis that 
Pi > p 2 can be used also for testing the hypothesis that pi = p 2 if the alternative 
hypotheses are restricted to p 2 > pi. 

In addition to simplicity and exactness the present method seems superior to 
the classical one in the following respect: Suppose that (contrary to the original 


14 The author believes that the loss in efficiency is slight even when N is small, although 
no exact investigation of this case has been made. 
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assumption) the probability of a success varies from trial to trial. Denote by 
Pi** the probability of success in the t-th trial of the first set, and by pt^ theprob- 
ability of success in the i-th trial in the second set (i = 1, • • • , N), Assume that 
that the probabilities p[ l) and p? 1 * are entirely unknown and we wish to test the 
hypothesis that — p^ = • • • = p[ s) — p^ N) = 0. In this case the classical 
method is not applicable, but the present method provides a correct procedure. 
Such a situation may arise, for instance, if we want to test the hypothesis that 
the probability of a success (hitting the target) is the same for tw r o different guns. 
In the course of the experiments the probability of a hit may change due to ex¬ 
ternal conditions such as wind, disposition of the gunner, etc. However, these 
external conditions are likely to affect both guns equally if the trials are made 
alternately (or approximately alternately), so that if the two guns are equally 
good we have p[ l) = p[ x) (i = 1, • • • , AT). 

5.3.4. Sequential test of the hypothesis that pi > p 2 . In order to devise a proper 
sequential test for testing the hypothesis that pi > p 2 , we have to state first 
what risks of making wrong decisions we are willing to tolerate. The efficiency 
of the production process 1 may be measured by the ratio of effectives to in¬ 
effectives produced, i.e., by fa — . Production process 1 may be regarded 

1 — pi 

the more efficient the larger the value of fa . Similarly, the efficiency of produc¬ 
tion process 2 may be measured by fa = — . The relative superiority of 

1 — P 2 

production process 2 over the process 1 can then reasonably be measured by the 
ratio of fa to fa i.e., by 


(5.19) 


= b z 

fa Pi(l ~ Pz) 


If u = 1, the two processes are equally good. If u > 1, process 2 is superior to 
process 1, and if u < 1, process 1 is superior to process 2. Thus, the manu¬ 
facturer will, in general, be able to select two values of u , u 0 and Ui say ( u 0 < ui) 
such that the rejection of process 1 in favor of process 2 is considered an error of 
practical importance whenever the true value of u < u 0 , and the maintainance 
of process 1 is considered an error of practical importance whenever u > u x . 
If u lies between u 0 and u \, the manufacturer does not care particularly which 
decision is taken. 

Clearly, we will always have uq < Ui. If the transition from production 
process 1 to process 2 involves some cost or other inconveniences, it seems 
reasonable to put uo = 1 (or uq may even be slightly greater than one). This 
choice of u Q really means that we consider the rejection of process 1 a serious error 
whenever this process is not inferior to process 2. On the other hand, if the 
transition from process 1 to process 2 does not involve any inconveniences, the 
rejection of process 1 in favor of 2 cannot be a serious error when the two processes 
are equally efficient, i.e., when u = 1. Thus, in such a case, it seems reasonable 
to choose ?/o somewhat below 1. 
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After the quantities Uq and u\ have been chosen the risks that we are willing 
to tolerate may reasonably be expressed in the following form: The probability 
of rejecting process 1 should not exceed a preassigned value a whenever u < Mo , 
and the probability of maintaining process 1 should not exceed a preassigned 
value p whenever u > u \. 

Thus, the risks that we are willing to tolerate are characterized by the four 
quantities mo, u x , a and p. After these four quantities have been chosen, a 
proper sequential test can be carried out as follows: The (conditional) prob¬ 
ability that we obtain a pair (0,1), as given in (5.17), can be expressed as a func¬ 
tion of u. In fact 


(5.20) 


^ _ (1 ~ pi)jh 

P Pi(l - pi) + Pi( 1 - Pi) 


(1 - Pi)P2 
Pi(l - Pi) 
i i - Pi) 

+ Pl(l - P 2 ) 


M 

1 + U 


Let Ho denote the hypothesis that p 


Uo 

1 + uo* 


and Hi the hypothesis that 


p = x — . A proper sequential test satisfying our requirements concerning 

tolerated risks is the sequential probability ratio test of Ho against Hi . The 
acceptance and rejection numbers for this sequential test can be obtained from 


(5.9) and (5.10) by substituting — for p 0 --- y 1 - — for pi and t = ti 4* k for m. 

1 + Me 1 + Ui 

Thus, for each value of t the acceptance number is given by 


P 


(5.21) 


A t - 


log Ui - log Uo 
and the rejection number is given by 

i 1-0 

log 

(5.22) R t - 


a 


+ t 


+ t 


log 


1 + Ml 
1 + Mo 


log Mi — log Mo 


log 


1 4- M l 

1 + Mo 


log Mi — log Mo log Mi — log M 0 


These acceptance numbers A t and rejection numbers R t (t = 1, 2, • • • ) are best 
tabulated before experimentation starts. The sequential test is then carried out 
as follows: The observations are taken in pairs where each pair consists of an 
observation from the first process and an observation from the second process. 
We continue taking pairs as long as At < h < Rt . At the first time when t 2 
does not lie between the acceptance and rejection numbers, experimentation is 
terminated. P^pcess 1 is maintained if at this final stage t 2 < A t , and process 1 
is rejected in favor of 2 if t 2 > R t . 

The test procedure can also be carried out graphically as shown in Figure 5. 
The total number m of pairs (0, 1) and (1, 0) is measured along the horizontal 
axis. The points (t, A t ) will lie on a straight line Lo, since A* is a linear function 
of t. The points (t, R t ) will lie on a parallel line Li. We draw the lines Lo and 
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Li and plot the points (t y ti) as experimentation goes on. At the first time when 
the point (t, t 2 ) is not within the lines L 0 and L\ experimentation is terminated* 
Process 1 is maintained if at the final stage the point (f, U) lies on L 0 or below, 
and process 1 is rejected if the point ( t , t 2 ) lies on L\ or above. 

5.3.5. The operating characteristic curve of the test. For any value u of the ratio- 

~ we shall denote by L u the probability of maintaining process 1. Clearly, 

"'i 

is a function of u. This function L u is called the operating characteristic curve 
of the test. The operating characteristic curve can be determined from the 



Fig. 5 


These equations are: 
(5.23) 

and 



u 

l~+h 



( ji *»> Y — (' 

\Wo(l + Ui)/ \1 + uj 


(5.24) 
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For any given value h we compute the values of u and L u from these equations. 
The point (u, L u ) obtained in this way will be a point of the operating character¬ 
istic curve. Calculating the points (u, L u ) for a sufficiently large number of 
values of h we can draw the operating characteristic curve. 

5.3.6. The average amount of inspection required by the test . For any value u 


of the ratio r 2 denote by E u (t) the expected value of the total number of pairs 

fCi 

(0,1) and (1, 0) required by the test. The value of E u (t) can be obtained from 


(5.14) by substituting E u {t) for E p (n ), L w for L p , r——— for pi and r~— for 

1 T 1 *7“ “Uo 

Po . Thus 


(5.25) 


E u (t) 


L u log —-h (1 - L u ) log --- 

_ 1 — a a 

u Ui{ l + Up) _ 1 1 + Up ' 

1 + u IOg 'Moll + u{) 1 + u l0g 1 + U\ 


To compute the expected value of the total number of pairs (including also 
the pairs (0, 0) and (1, 1)), we merely have to divide the right side expression in 
(5.25) by pi(l - p 2 ) + p 2 (l - Pi). 

In the rare event that no decision is yet reached at a number of pairs equal to 
three times the expected value, we can truncate the test at that stage without 
seriously affecting the probabilities of making a wrong decision (see section 4.6 
in Part I). 

5.3.7. Observations made in groups of r. In applications it may happen that at 
each stage in the sequential process instead of drawing a single observation we 
draw r observations from each of the binomial distributions. Hence, instead of 
a single pair, we have two sets of r observations. If the order of observations 
in each such set of r is recorded, we can establish the number of pairs (0, 1) and 
the number of pairs (1,0) for each pair of sets of r observations. In such a case 
the test can be carried out as described in section 5.3.4, since after each pair of 
sets of r observations we can compute t and t 2 . The only effect of taking the 
observations in groups of r is that more observations will generally be necessary 
(approximately enough to fill out a group) and thereby the probability of making 
an incorrect decision will be made somewhat smaller. However, if the order of 
observations in such groups of r is not recorded, the difficulty arises that we are 
not able to determine the values of t and U needed for the test procedure. It has 
been shown in [7] that in such a case we may replace % and t 2 by certain estimates 
of t and t 2 without affecting seriously the probability of making an incorrect 
decision. The estimates of t\ and t 2 (and thereby also an estimate of t = t\ + t 2 ) 
are obtained as follows: Let ri be the number of successes in the group of r ob¬ 
servations drawn from the first binomial distribution, and let r 2 be the number 
of successes in the group of r observations drawn from the second binomial distri¬ 
bution. Then for this pair of groups of r observations, we estimate the number 
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of pairs (1,0) to be n — — and the number of pairs (0,1) to be r 2 — — . Thus, 

r> t ,, 

an estimate of h is obtained by summing ri — ~~ over all pairs of groups ob¬ 
served, and that of U is obtained by summing r 2 — — 2 over all pairs of groups 

T 

observed. < 

5.4. Application to testing the mean of a normal distribution with known stand¬ 
ard deviation . 5.4.1. Formulation of the problem. Suppose that a measurable 
quantity x is normally distributed with unknown mean 0 and known standard 
deviation <r. For example, x may be some measurable quality characteristic 
of a unit of a certain product where x is normally distributed with a known 
standard deviation in the population of all units. The problem we shall con¬ 
sider here is to test the hypothesis that the unknown mean 0 is less than a specified 
value 0'. This problem arises frequently, for example, in quality control. 
Suppose that the quality of the product is considered the better the higher the 
mean value of x. Thus, there will be a value 0' such that the product is con¬ 
sidered sub-standard if 0 < 0' and the product is considered to meet specifications 
if 0 > 0'. Since 0 is unknown, we are usually interested in testing the hypothesis 
that 0 < 0', i.e., that the product is sub-standard. 

Since quality control is an important field of application for such test proce¬ 
dures, the discussion will be continued in the terminology of quality control. 
This, of course, should not be interpreted as a restriction upon the general 
validity and applicability of the test procedure. The problem treated in section 
5.4 can now be stated as follows: Let x be a measurable quality characteristic 
of a unit of a certain product. The variable x is supposed to be normally 
distributed with known standard deviation in the population of all units pro¬ 
duced. The problem is to devise a sampling plan for testing the hypothesis 
that the product is sub-standard. The product is said to be sub-standard, if 
the mean 0 of a; is less than a given specified value 0'. 

5.4.2. Tolerated risks for making a wrong decision . No sampling plan can 
guarantee that the correct decision will always be made, i.e., that the product 
will be declared sub-standard if and only if 0 < 0'. The larger the amount of 
inspection, the smaller we can make the risks for making a wrong decision. If 
inspection is costly, or destructive, we are willing to tolerate some risks of making 
wrong decisions in order to reduce the necessary amount of inspection. Thus, 
a proper sampling plan can be recommended only after the risks that can be 
tolerated have been stated. 

If the quality of the product is exactly on the margin, i.e., if 0 = 0', then it 
will make little difference whether the product is classified as sub-standard or 
not. However, if 0 is considerably smaller than 0', then the acceptance of the 
hypothesis that the product meets specifications (rejection of the hypothesis 
that the product is sub-standard) will usually be considered as a serious error. 
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Similarly, if 0 is much larger than 0', the acceptance of the hypothesis that the 
product is sub-standard will generally be considered as a serious error. Thus, 
the manufacturer will, in general, be able to select two values of 0, 0 O and 0i say 
(0o < 0' and 0i > 0') such that the classification of the product as satisfactory 
(meeting specifications) is considered an error of practical importance whenever 
0 < 0o, and the classification of the product as sub-standard is considered an 
error of practical importance whenever 0 > 0i. If 0 lies between 0o and 0i, a 
wrong classification of the product will not be viewed as a serious error, since 
in this case 0 is near the marginal value 0'. 

After the two values 0 O and 0i have been selected, the risks that we are willing 
to tolerate can be stated in the following form: A sampling plan is required 
such that the probability of classifying the product as satisfactory is less than 
or equal to a preassigned quantity a whenever 0 < 0 O , and such that the prob¬ 
ability of classifying the product as sub-standard is less than or equal to a 
preassigned quantity (3 whenever 0 > 0i. Thus, the tolerated risks are char¬ 
acterized by the four quantities 0 0 , 0i, a and A proper sampling plan can 
be devised after these four quantities have been selected. 

5.4.3. A sequential test of the hypothesis that 0 < 0' (the product is substandard). 
Let Ho be the hypothesis that 0 = 0o and let Hi be the hypothesis that 0 = 0i. 
Let T be the sequential probability ratio test for testing H 0 against Hi such that 
a is the probability of accepting Hi when H 0 is true and /3 is the probability of 
accepting 7/ 0 when Hi is true. This sequential test will satisfy all our require¬ 
ments, since for this test the probability of accepting Ho (declaring the product 
as sub-standard) is < d whenever 0 > 0i, and the probability of accepting Hi 
(declaring the product as satisfactory) is < a whenever 0 < 0 U . 

The sequential test T is given as follows: Denote the successive observations 
on x by Xi , x 2 , • • • , etc. Accept the hypothesis that the product is satisfactory 
at the ra-th observation if 

— (1/2**) 2 ) 2 

C or —1 1 _ R 

<5.26) log —---— > log -- " . 

-(1/2* 2 ) 2 (*«“®0) 2 a 

C a— l 

Accept the hypothesis that the product is sub-standard if 

-(1/2* 2 ) £ 

e «-i r 

(5.27) log-;-< log ■^ 

-(1/2* 2 ) 2 (*«~0u) 2 

e «-i 

Take an additional observation if 

-(1/2*2) 2 Ua-*l) 8 

S e i i — 

(5.28) log < log-< log--. 

I — tv m at 

— (l/°*2) 2 (*o—0o) 2 

e i 
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The inequalities (5.26), (5.27) and (5.28) are equivalent to 
5-29) ± x. > ~~ e log l -^j + m 6 ±±*> 

**»l vl vQ CL Z 


x « ^ 

a—1 


ft + 


ft + ft 


< X) < * A 

a—1 Ul — Vq 


1 ~ fi 


respectively. 

Using the inequalities (5.29), (5.30) and (5,31) the test procedure can easily 
be carried out as follows: For each m compute the acceptance number 

(5.32) A„ = log r ^~ a + m 0 ±±!' 

and the rejection number 

(5.33) R m = log IrJ + « 0 ±±1\ 

These acceptance numbers A m and rejection numbers R m are best tabulated 

m 

before inspection starts. Inspection is continued as long as A m < T"! x« < 

a—l 

m 

R m . At the first time that 52 does not lie between A m and R m , inspection 

a — l 

m 

is terminated. If at this final stage x a < A m , the hypothesis that the 

a-l 

m 

product is sub-standard is accepted, and if 23 > i? m , the hypothesis that 

a—l 

the product is sub-standard is rejected. 

The test procedure can also be carried out graphically as shown in Figure 6. 
The number m of observations is measured along the horizontal axis. The 
points ( m , A m ) will lie in a straight line L 0 and the points (m, R m ) will lie on a 
parallel line L } . We draw the parallel lines L 0 and L\ and plot the points 

52 as inspection goes on. At the first time when the point ^?n, 
does not lie between the lines L 0 and L\ inspection is terminated. The hypothe¬ 
sis that the product is sub-standard is rejected if the point lies on L\ 

or above. The hypothesis in question is accepted if the point ^m, ]T) 
lies on L 0 or below. 
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5.4.4. The operating characteristic curve of the test . For any value 0 denote by 
L 9 the probability that the hypothesis that the product is sub-standard is 
accepted. Obviously, L 9 will be a function of 6 and is called the operating 
characteristic curve of the test. The shape of the operating characteristic curve 
will, in general, be of the type shown in Figure 7. L$ approaches 1 as 0 —» — « 
and L 9 approaches zero as 0 . Furthermore, L 9 is a decreasing function 

of 0. We already know the values of L 9 for 0 = 0 O and 0 = 0i. Now we shall 

give a method for computing the value of L 9 for any 0. If —- is fairly small, 



Fig. 6 

which will usually be the case in practice, a good approximation to L 9 is given 
by (see equation 3.35) 

Mu 

where the constant h is determined as follows: First we compute the character¬ 
istic function <p(t) of the variate 

-jjife-#!)* 

(5.35) z = log —,- ~ [2(0, - 6 a )x + 0$ - e\]. 

2a* 
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Thus, z is normally distributed with mean 
(6 & ^' 

— . Consequently, <p(t) is given by 


0o — 9i , (0\ — 
2a 2 a 3 


= ±-ur + —\ 


(5.36) *>(0 = € l 

The value h is the non-zero real root of the equation <p(l) = 1. 


(5.37) 


(fli — 0g) — 2(fli — 

(0i — 0 q) 2 


and variance =» 


Hence 


0i+0o — 20 

0i — 0Q 


The operating characteristic curve can be computed from (5.34) substituting 
the right hand side member of (5.37) for h . 

5.4.5. The average amount of inspection required by the test. Let Ee(n) denote 
the expected value of the number of observations required by the test when 0 


L 0 



is the true mean of x. 
E e (n ) is given by 

E e (n) 


According to (4.8) a good approximation to the value of 
. 2 Le log + (1 - L») log -TJ 

J<7 1 — a __ a 

0o — 0 \ + 2(0i — 0 O )0 


where Le is given by (5.34). i 

In the rare event that the number of observations reaches three times the 
expected value before the test is terminated, we can truncate the test at this 
stage without seriously affecting the probabilities of making a wrong decision. 
(See section 4.6 in Part I). 


6. Outline of a General Theory of Sequential Tests of Hypotheses when No 
Restrictions Are Imposed on the Alternative Values of the Unknown 

Parameters 

6.1. Sequential test of a simple hypothesis with no restrictions on the alternative 
values of the unknown parameters. Consider the following general case. Let 
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X \, • • • , X p be a set of p random variables and let f(x i, • • • , x p , $%, • * • , 0*) 
be the joint probability density function of these random variables involving k 
unknown parameters 0i, • • • , 0* • Suppose that we wish to test the hypothesis 
Ho that $i = $i , • • • , 0* = 0*, where 0?, • • • , 0* qre some given specified values. 
Denote the set of all a priori possible parameter points by G. Assume that G 
contains at least a finite fc-dimensional sphere with the center (0?, • • * , 0*). 
Let 0* be the set of all possible alternative parameter points; i.e., G* is the 
whole parameter space G with the exception of the point 0° = (0?, • • • , 0*). 
For any statistical procedure for testing H 0 , the probability of an error of the 
first kind, will have a definite value, but the probability of an error of the second 
kind will depend on the true alternative; i.e., it will be a single valued function 
0(0) defined over all points 0 of G*. Let w(B) be some non-negative function, 

called weight function, such that / w(0) dd = 1. Suppose that we wish to 

construct a sequential test such that the probability of an error of the first kind 

is equal to a given a and that the weighted average / w(0)0(0) d(0) of the 

Jo* 

probabilities of errors of the second kind is equal to some given positive value 0. 
This problem can easily be solved as follows: Let po n be equal to the product 

ft 

n f(xia , • • • , Xpa , 0? , • • * , 0?) where Xi„ denotes the ath observation on 
0*1 

Xi (i = 1, • • • , p; a = 1, • • • , n). Furthermore, let Pi„ be defined by 

(6.1) pi„ = w(S) . • , Xpa , 6i, . . .,«*)] d8. 

The expression p in can be interpreted as the probability density in the sample 
space of n observations on the variates X\ , • • • , x p , if we assume that the 
parameter point 0 in G* has a probability distribution given by the density 
function w(0) dd. 

We shall denote by II\ the hypothesis that the probability density function 
in the sample space of n observations onli, • • • , X p is given by pm defined in 
equation (6.1). The problem of testing Ho against the single alternative Hi 
is not exactly of the type discussed in Part I, since pi n given in (6.1) cannot be 
represented, in general, as a product of n factors where the atli factor depends 
only on the observations x ia , • • • , x Pa . However, it was pointed out in sec¬ 
tion 3.2 that the fundamental inequalities derived in Section 3.2 remain valid 
also when p in is given by an expression of the type (6.1). Thus, we can use the 
sequential probability ratio test for testing H 0 against the single alternative Hi . 
We reject Ho if 

(6.2) > A, 

POn 

we accept H 0 if 

(6.3) — < B, 

Pdn 
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and we make an additional observation if 
(6.4) i? < Hi= < A. 

POn 


The expression p in is given by (6.1) and the constants A and B are chosen so 
that the probability of accepting Hi when H 0 is true is a, and the probability 
of accepting H 0 when Hi is true is (3. Thus, for practical purposes we may put 


/x — - auu ju — - . 

a 1 — a 

Using the sequential process defined by the inequalities (6.2), (6.3), and (6.4) 
we obviously have 


(6.5) / w(e)p{o)de = 0 

J o* 

where for each point 0 in S2*, (3(0) denotes the probability of accepting H 0 under 
the assumption that 6 is the true parameter point. 

Thus, the sequent&l test given by (6.2), (6.3), and (6.4) provides a satisfactory 
solution of the problem if we want a test procedure such that the probability of 

an error of the first kind is a and the weighted average / w(6)&(6) d9 of the 

Jq* 

probabilities of errors of the second kind is Practical problems, however, 
do not always take this form. Many instances require a test procedure such 
that 1 3(6) should be less than or equal to a given positive value (3 for all parameter 
points 9 whose “distance’’ (defined in some sense) from 6 0 is greater than or 
equal to some given positive value d Q . The “distance” of two parameter points 
0 l and 0 2 may be defined by some function S(0 l , 0 2 ) which is equal to zero if 0 1 = 0 s 
and is greater than zero if 0 l y* 0 2 . Furthermore, for any three points 0 l , 0 2 , 0 s 
we have 6(0 1 , 0 2 ) = 6(6% 0 1 ) and 6(0*, 0 2 ) + 6 ( 0 3 , 0 2 ) > 5(0 1 , 0 3 ). The distance 
function will, in general, be chosen according to practical needs and mathe¬ 
matical convenience. 

Given the distance function 5(0\ 0 s ) and given the requirements that the 
probability of an error of the first kind be a and the probability of an error of 
the second kind should not exceed f3 whenever the distance of the true parameter 
point from 0° is greater than or equal to d 0 , the aim is, of course, to construct 
a sequential test which satisfies these requirements with a minimum expected 
number of observations. 

While an exact solution of this problem has not yet been found, the following 
approach seems reasonable: Let be the set of all parameter points 0 for which 
5(0°, 0) > do . We restrict ourselves to the class Cs of sequential tests based on 

the ratio — where 

POn 

n * 

(6.6) P()n " XT /(•£la > * * * j , 01 y * * * , 0fc), 

a —1 

f , n . 

Pin " / W(8) n / (*i«, • • •, X pa , 0i, * • *, Ok) dd 

JQq «-l 


(6.7) 
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and w(e) may be any non-negative function of 6, called weight function, for 
which 


( 6 . 8 ) 



For carrying out the sequential test two constants A and B are chosen. 


The 


hypothesis Ho is accepted if — < B, Ho is rejected if > A, and an additional 

Pon Pon 

observation is made if B < Hi: < a. The restriction to the class C a of sequen- 

Pon 

tial tests is suggested by the fact that we are led to these tests if it is required 
that some weighted average of the probabilities of errors of the second kind be 
equal to a given value 0. 

Accepting the restriction that the sequential test should be a member of the 
class C$ , we still need a principle for choosing the weight function w(0). It is 
clear that the maximum of 0(0) in O 0 depends on the quantities A, B, and the 
weight function w(0). Denote this maximum value by /S M JnA, B , w(B)]. Since 
it is desirable to make j8mox[A, B , w(6)] as small as possible, it is proposed to 
determine w(0) so that the expression /3m»x[A, B, w(0)] becomes a minimum with 
respect to w(0). Since for given values A and B the value of the weighted 

average / w(0)0(0) d6 is practically independent of iv(0) (it is nearly equal 

•'Do 

t0 ~ j i — B minimizing x[A, B , w(0)] is practically equivalent to mini¬ 
mizing the difference 0m ax[A, B, w(0)] — [ w(B)0(0) dO. For convenience we 

•'Do 

determine w(0) so that /3 M ax[A, B , w(0)] — [ w(6)0(0) dd becomes a minimum. 

•'Do 

For this weight function the maximum of 0(0) in Q 0 will depend only on A and B. 
Denote this value by 0 (A, B). Finally we determine the values A and B so 
that 0(A, B) — 0 and the probability of an error of the first kind becomes a. 

The determination of w(6) is a problem in the calculus of variations. In 
some important cases, however, the solution can be obtained by the following 
simple procedure: Let S(d) be the set of all parameter points 6 for which 
8(0 , 0) == d. Let v(0 ) be a non-negative weight function defined over the 

surface S(d 0 ) so that the surface integral / v(0) dw = 1 (where dec de- 

•'S(d 0 ) 

notes the infinitesimal surface element). Consider the following sequential 
procedure: Reject H 0 if 


(6.9) 


f v (&) I XT /(*Ela > * * * > %pa y 01 f * * * j 0k) J d(jJ 
Jsidp) _ L a __ J 

XX f(% la y * * * » $pa , B\ , • * * , 0k ) 


is greater than or equal to A, accept Ho if (6.9) is less than or equal to £, and 
make an additional observation if the value of (6.9) lies between A and B. The 
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constants A and B are so chosen that the probability of an error of the first kind 
is a and / 0(6)v(6) da — 0. In many statistical problems it is possible to 

J a(d 0> 

find a weight function v(6) such that for a conveniently chosen distance function 
$(0\ (?) the probability 13(d) of an error of the second kind becomes constant on 
the surface S(d) for any value d, and, furthermore, j3(d) decreases with increasing 
d . For such a weight function t>(0), the sequential test based on (6.9), will 
provide a solution of the problem. In fact, the weight function v(6) over the 
surface S(d 0 ) can be considered a limiting case of a weight function w(6) defined 
in Q 0 which takes the value zero for any 6 whose distance from 0° is greater than 
d 0 + A with A approaching zero in the limit. For the weight function v(6) the 
maximum of £(0) in il Q is equal to the weighted integral of 0(6). Thus, for this 
weight function the difference between the maximum of 0(6) and the weighted 
integral of 0(6) is minimized. 

We shall illustrate this procedure by a simple example. Let Xi , • • • , X* 
be k normally and independently distributed variates with unit variances. The 
mean values 0i, • • • , 0* are unknown. Suppose that it is required to test the 
hypothesis H 0 that 0i = • • • = 6 k = 0. Assume that the distance of two points 
0 l and 0 2 is equal to 

+ V(e\ - e\?~+ ■ ■ • +"(e,‘ - elf. 


Then S(d) is a sphere with center at the origin and radius d. Let v(6) be con¬ 
stant on S(d Q ) and equal to the reciprocal of the area of S(d 0 ). We shall show 
that for this weight function v(6 ), 0(6) is constant on the sphere S(d) and is 
monotonically decreasing with increasing d. For this purpose we prove first 
that (G.9) is a monotonically increasing function of x\ + • • • + xl where x, 
is the arithmetic mean of the observations on x,-. In fact, the expression (6.9) 
becomes 


( 6 . 10 ) 


c k 


(2r) 


f exp ~ 53 23 (**•« ~ 0*) 2 1 dco 
Js(do) L ~ «—l a*-! J 


1 

(2ir)M 


knTi ex P 


= c k exp [— § ndl] / exp [nSx^,] dw 

J S(d 0 ) 


where Ck is the reciprocal of the area of S(do) and x, is the arithmetic mean of 
the n observations x, a (a = 1, • • • , n). Let r x denote | x 2 ( and let 

«(*) (0 < a < w) denote the angle between the vector (.fi, • • • , x*) and the 
vector (0i, • • • , 6k). Then (6.10) can be written 


(6.11) Ck exp [—i ndl] / exp (nr x do cos [a(0)])cL>. 

J s(d a) 

Because of the symmetry of the sphere, the value of (6.11) will not be changed 
if we substitute 7 ( 0 ) for a(6) where 7 ( 0 ) (0 < 7 ( 0 ) < *) denotes the angle 
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between the vector 0 and an arbitrarily chosen fixed vector u. From this it 
follows that the value of (6.11) depends only on r*. 

Now we shall show that (6.11) is a strictly increasing function of r x . For this 
purpose we have merely to show that 

(6.f2) I(r x ) = f ' exp (nr x do cos bWDdw 

•Wo) 

is a strictly increasing function of r x . We have 

(6.13) = f nd 0 cos [ 7 ( 0 )] exp (nr*d 0 cos [y(0)])dw. 

cLr x J S (do ) 


Denote by 01 the subset of S{d 0 ) in which 0 < 7 ( 0 ) < 5 , and by w 2 the subset 
in which ^ < 7 ( 0 ) < t. Because of the symmetry of the sphere we have 


J ' ndo cos [ 7 ( 0 )] exp (nr* do cos [ 7 ( 0 )]) do 

u 2 

* 

= / ndo cos [ 7 r — 7 ( 0 )] exp ( nr x d 0 cos [ir — 7 ( 0 )]) da) 

= — ndo cos b(0)l exp (— nr x d 0 cos [ 7 ( 0 )]) do. 

J U1 


Hence 


(6.14) dr. 


= ndo [ cos [ 7 (e)] 

arm Jon 

(exp (nd 0 r* cos [ 7 ( 0 )]) — exp (—nd 0 r* cos [ 7 ( 0 )])) do 


The right hand side of (6.14) is positive. Hence, we have proved that expres¬ 
sion (6.11) (or (6.10)) is a strictly increasing function of r x . 

To show that 0(0) is constant on S(d) and is monotonically decreasing with 
increasing d, let yi , ■ • • , yk be an orthogonal linear transformation of xi , • • • , xh 
so that E(yi) — Vflf + • • + d \, E(yi) = 0 (t = 2, • • • , k). Since y\ + 
• • * + yl = + * * • + xl and since (6.11) depends only on x\ + ■ • • + x\ , 

it is seen that the sequence of expression (6.11) formed for any sequence of 
integers n has a joint distribution which depends only on + • • • + el . 
Hence 0(0) is constant on any sphere with center at the origin. Since (6.11) 
is a strictly increasing fun ction of r x , it ca n be shown that 0(0) is a monotonically 
decreasing function of • • + $1 . Hence, we can test the hypothesis 

Ho by the sequential process based on (6.10). 

If A? = 1—that is, if we test the mean value of a single normal variate—the 
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sphere 8(d) is a O-dimensional sphere consisting of the two points Si « +d and 
$i = —d and expression (6.10) reduces to 

- do) 2 ] + exp f— JS«(ar* + do) 2 ]} 

( 2^» exp[_i2x ‘ 1 

*= } exp [—£nd?]{exp [n2dj[ + exp [— n£do]}. 

6.2. Sequential test of a composite hypothesis . We shall give only a brief 
outline of the principles on which a sequential test of a composite hypothesis 
can be based, since they are analogous to those for a simple hypothesis. Let 
Xi , • • • , X p be a set of p random variables and let f(xi , * • • , x p , , • • • , 0*) 

be the joint probability density function of these variables involving k unknown 
parameters $i , • • • , $ k . Denote the set of all possible parameter points B = 
(0i, • • • , Si) by ft. Suppose that we wish to test the hypothesis ifo that the 
true parameter point S is contained in the subset w of ft. Let « be the set of 
all points of ft which are not contained in w. Furthermore, let Wo{B) and W\(B) 
be two non-negative functions of 6, called weight functions, such that 

(6.16) f Wo(0)dB * 1 and f Wi(0)dB » 1. 

Jot J& 

If <*> is a surface in the space ft then the integral over u> is meant to be the surface 
integral over a?. 

In testing a composite hypothesis the probability of an error of the first kind 
need not necessarily be the same for all points S in w. It will, in general, be a 
function a(6) of the true point $ in «. Similarly the probability of an error of 
the second kind is a function 0(0) of 6 defined for all points in «. Suppose that 
we wish to construct a sequential test such that the weighted average 

w(0)a(B) dd of the probabilities of errors of the first kind is a given value 
a, and the weighted average / w(6)f}(6) d$ of the probabilities of errors of 

J<5 

the second kind is a given value 0. Then the following sequential test can be 
used: Denote by if? the hypothesis that the probability density in the sample 
space of n observations on X x , • • • , X p is given by 

(6.17) Pon * f Wo(0)[H f(x la i • * • , Xpa > 01, * * * » 0*)] d0 

a 

and by iff the hypothesis that the density in the sample space is given by 

(6.18) Pin ** f t*(0)[II /(®i« y * * * > ®p«» 01> ' ’' > 0*)] d$, 

*5 a 

The sequential probability ratio test for testing if* against the single alternative 
iff provides a solution of our problem. If the constants A and B in this sequen- 
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tial test are chosen so that the probability is a that we reject H$ when H* is 
true, and the probability is 0 that we aceept H* when H* is true, then for this 
sequential test we have 

f w o (0)a(0) d$ — a 
and 

f Wi(O)0(O) dd « 0. 

This can be proved in the same way as the corresponding statement in the oase 
of a simple hypothesis. 

Frequently we may require a sequential test procedure such that the least 
upper bound of a(0) in «is equal to a given a and 0(6) is less than or equal to a 
given 0 for all points 6 whose “distance” (defined in some sense) from w is greater 
than or equal to a given positive value do . The “distance” of a parameter 
point 0 from o> may be defined by some function 6(0, «) which is positive if 0 
is not in u> and is zero if 6 is in w. The distance function will be chosen in general 
according to practical needs and mathematical convenience. For reasons simi¬ 
lar to those discussed in the case of a simple hypothesis, an appropriate sequential 
test procedure with the desired properties can be found as follows: Let u(d) 
be the set of all points 0 for which 6(6, w) > d. Let, furthermore, w o (0) and 
Wi(0) be two weight functions such that 


(6.19) 


f w 0 (0) d6 = f Wi(6) dd 
JS(d 0 ) 


1. 


Denote by H* the hypothesis that the probability density in the sample space 
of n observations on X\, • • • , X p is given by 


( 6 . 20 ) 


P<>» 



dO 


(n * 1,2, •••) 


and by H* the hypothesis that the probability density in the sample space of n 
observations on X \, • • • , X P is given by 


( 6 . 21 ) 


Pm 


= f WiW rfl/(^m, •••, ^)1 (n-1,2, •••) 

Ju(d o) L**- 1 J 


Consider the sequential probability ratio test for testing the simple hypothesis 
Ho against the single alternative H* . For any 0 in w let a(0) be the prob¬ 
ability of accepting H? when 6 is true, and for any 6 in w let 0(6) be the prob¬ 
ability of accepting Ho when 6 is true. It is clear that a(0) and 0(0) depend on 
the constants A and B used in the sequential process and on the weight functions 
Wo(0) and Wi(0). For given A, B, Wo(0) and Wi(0) let 0[A, B, wo(0), Wi(0)] be the 
least upper bound of 0(0) in «(do) and let a [A, B, w 0 (0), W\(0)] be the least upper 
bound of a(0) in w. Consider the difference / 

A ot[A, B, Wo(0), Wi(0)] - ot[A , B, Wo(0), u>i(0)] - J Wo(0)a(0) dO 
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and 

Aj 8{A, B , Wo(6), wM)] = 0[A, B, u> o (0), wtf)) - ( w x (B)m dB. 

^w(do) 

Determine Wo(0) and Wi(B) so that Max [Aa, A/3] is a minimum. For 'these 
weight functions the least upper bound of a(0) in o> and the least upper bound of 
0(0) in aj(do) will be functions of A and B only. Finally, wc determine A and B 
so that the least upper bound of a(0) in w becomes a, and the least upper bound 
of 0(0) in w(do) becomes 0. 

The determination of Wo{6) and Wi(B) involves the solution of problems in 
the calculus of variations. However, in some important cases the solution of 
the problem can easily be derived, since weight functions w; o (0) and Wi(B) can be 
found for which Aa = A0 = 0. Such a situation is given, for instance, in the 
following case: Let S(d) be the set of all points 0 for which 5(0, w) = d. Suppose 

that we can find two weight functions v 0 (B) and v } (B) such that J v Q (B) dB = 
/ Vi (0) dS = 1 (dS denotes the infinitesimal surface element of 8(do)) and 

J s(d 0 ) 

the sequential probability ratio test based on 


f i'lWlII fix la , 0)1 US 

[ Vo((?)in fix la , ■ • ■ , X„a , 0)] de 

•'w a 


has the following properties: (1) a(0) is constant in <*>; (2) 0(0) is constant on 
S(d !) for any d > do ; (3) 0(0) is strictly decreasing with increasing d in the 
domain d > do. Then for these weight functions we evidently have Aa = 
A0 = 0. 

Let us illustrate this by a simple example. Let A" be a normally distributed 
variate with unknown mean* /i and unknown variance a\ Suppose that we 
want to test the hypothesis Ho that /i = 0 and that the distance of the point 

i 

(n, or) from the set co is defined by - !. 

The set S(do) then consists of all points (/u, <r) for which n = + doo or m = —dyo. 
The set w consists of all points (0, a) where a can take any arbitrary positive 
value. Let r be a positive value. AVe define the weight functions k’or(flr), and 

Vi r (a) as follows: v 0 r (<r) = - if 0 < <r < r and equals zero for all other values of 
r 

<t . The weight function ivM is equal to ~ if 0 < <r < r and m = dzdoc and equal 


to zero otherwise. 
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Hence 


»• ■ L, w <siW- '“p [- § * 

i i r f i r 1S(X.- doff)*"] 

= (2^2 rl \? CX pL“2- ? -J 

”*• - <dr \ {I ? exp [ - 5 7] *}' 

1 f I exp [- 1 „„ 

r r 1 r 1 

1 7-™"L-2 ;>-\r 


M mi f_ i«*. + <w 

Jo <r" 2 (y 2 

71 f 1 sxH , 

Jo a" L 2 <r 2 J 


We consider the limiting case when r —> <*. Then 


1 f" 1 f 1 S(x« - <4 a) 2 ' 

p s _ 2 ip ^ eXP L~2 .- . 


If 1 r 12(*„ + d„<r) 2 '| 

2 Jo r» exp L~2— ? —_T 

r 1 1 2 x«i, 


The sequential test based on the ratio (0.25) provides a solution of the problem 
if it can be shown to have the following three properties: (1) a(0) is constant in 

w; (2) 0(6) is only a function of ; (3) 0(0) is monotonically decreasing with 


increasing . Denote — 1 — by x and ^ ( x a — xf by S 2 . Since the dis- 
<t j n «-i 

tribution of depends only on M , the first two properties are proved if we 
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show that the ratio (6.25) is a single valued function of 


x 

S i 


First we show that the numerator of the ratio (6.25)is a homogenous function 
of (a?i, , x n ) of degree — (n — 1). In fact, making the transformation 

<j = \t we obtain 


n l f 1 sex** - doff) 2 ! 1 r 1 + do<r) ! H , 

?“ p L~»—?—J + ?“ p L‘2“^—Jr* 

n i r i - Aon ,i r 1 
(Sj--“ p l5- ¥ -J + (So*“ p L _ 2 ~~¥ -Jr (W) 

i r/i r i s(xa - djfi . i r ] s(*„ + d„tf]\ 

- X \? “ P L 2 — 9 — J + r- 2 —¥— Jj *• 


This proves that the numerator of (6.25) is a homogenous function of — (n — 1) 
degree. Similarly, it can be shown that the, denominator of (6.25) is also a 
homogenous function of degree — (n — 1). Thus the ratio (6.25) is a homog¬ 
enous function of zero degree in the variables X\ , • • • , x n . 

It can be seen that (6.25) is a function of the two expressions Zxl and Xx* 
only; i.e., 


(6.26) = 4>(Zxl,Zx a ). 

Pan 

Let v = | ^/xxl | . Since (6.26) is a homogenous function of zero degree, its 
value is not changed by substituting ^ for x a . Hence, 


(6.27) ^- n 

Pon 

Since , — £*<,) 


<t>(2xl , 2x a ), we see that 



(xf 


Since 2~L i s a single valued function of 

tr £> 


, we have proved that 


is a single 
J>Oh 


valued function of — 

I o 

In order to prove property (3) of the sequential test based on the ratio (6.25), 

I x 
— 


Since % is a strictly increasing function of , we have only to show that 
• v 2 

x 

(6.25) is a strictly increasing function of -- . The latter statement is obviously 
proved if we show that (6.25) increases with increasing value | x j while keeping 
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v fixed. For fixed value of v the denominator of (6.25) is constant. Thus, we 
have merely to show that the numerator of (6.25) increases with increasing 
| x | while keeping v fixed. This follows easily from the fact that 

exp + exp[— 

is a strictly increasing function of | x | . 
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NON -PARAMETRIC ESTIMATION. I. VALIDATION 
OF ORDER STATISTICS 

By H. Scheff£ and J. W. Ttjkey 
Syracuse University and Princeton University 

1. Summary. Previous work on non-parametric estimation has concerned 
three problems: (i) confidence intervals for an unknown quantile, (it) population 
tolerance limits, (Hi) confidence bands for an unknown cumulative distribution 
function (cdf ). For problem (Hi) a solution has been available which is valid 
for any cdf whatever, but for (i) and (ii) it has heretofore been assumed that the 
population has a continuous probability density. This paper validates the 
existing solutions of (i) and (H) assuming only a continuous cdf. It then modifies 
these solutions so that they are valid for any cdf whatever. 

2. Introduction. There are three problems of non-parametric estimation 
(we exclude point-estimation) for which fairly satisfactory solutions are available; 
their present status was summarized in a recent paper [4). The purpose of this 
series of articles is to extend and complete the theory of non-parametric estima¬ 
tion in directions of both theoretical and practical interest. 

In this series we shall employ the following conventions of notation: We dis¬ 
tinguish between a random variable and an arbitrary point in the Euclidean 
space containing its domain by using a capital Roman letter for the former and 
the corresponding lower case Roman letter for the latter. Thus if X is a (scalar) 
random variable, and x a real number or ± <®, we speak of the probability that 
X < x and denote it by Pr\X < .r). Roman capitals will also be used to denote 
cumulative distribution functions 1 (cdf s): A monotone non-decreasing function 
F(x) will be called the cdf of X if F(x + 0) = PrjX < x\. The definition of 
F(x) at its points of discontinuity will be immaterial. Again, E = (Xi, • • • , 
X n ) will denote a random sample from a population with cdf F(x) y whereas e = 
(.** 1 , • • • , x n ) will denote a point in the sample space R n . * If J is a function of e 
only, t — <p(e), then the random variable T = <p,(E) is a statistic. The order 
statistics of the sample E are defined to be — qc , Z \, • • • , Z n , + 00 , where z\ < 
2 2 < .. • < is a rearrangement of x \, x *, • * • , .r n . We shall write Z 0 = 
— oo, Z n4 1 = + oo. The device of including + « and — cc among the order 
statistics will enable us to avoid special statements to cover the case of one-sided 
estimation. Confidence coefficients tvill be denoted by 1 — a. Finally, it will 
be convenient to symbolize 2 the following three classes of cdf s: 12 0 is the class of 
all univariate cdfs F; Q 2 , the class of all continuous F; 12*, the class of all F with 
continuous derivative F'(x). 

1 One of the authors wishes to point out the need of a clear, concise, and adequate term 
for this basic and important concept. 

* The notation follows [3], 
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We now list the three problems. In each case it is understood that the solu¬ 
tion sought is to be valid for all cdf 8 in some chosen class. The names 8 asso¬ 
ciated with the problems are (t) W. R. Thompson, K. R. Nair, (it) Wilks, (Hi) 
Wald, Wolfowitz, Kolmogoroff. 

( i ) To find confidence intervals for an unknown quantile q p , where q P is 
defined by F(q p ) = p, 0 < p < 1; in other words, to find statistics T\ , T 2 such 
that 4 

(1.1) Pr{T l <q p < T 2 \F] = 1 - a. 

(«) To find tolerance limits T \, T 2 which, with confidence 1 — a, will cover a 
proportion b or more of the population, that is, 

(1.2) Pr{F(T 2 ) - F(T X ) > 6 | F} - 1 - 

(in) To find a confidence band for an unknown cdf F, that is, a random region 
R(E) in the x,y -plane such that 

(1.3) Pr{R(E ) covers g | F) = 1 — a, 
where g is the graph of y = F(x). 

The existing solutions of problem (in) are known to be valid for F in ih , 
but those of problems (i) and (it) have been validated only for F in . The 
extension to F in is an immediate consequence of the theorem in section 4; 
this section also contains a discussion of some other implications of the theorem. 
In section 5 the appropriate modifications of the solutions of problems (i) and 
(it) are found which extend their validity to the general case F in 17 0 . Whereas 
Pitman ([1]; also [4], p. 310) has shown how non-parametric tests may be ex¬ 
tended to the possibly discontinuous case, the only solution of the three estima¬ 
tion problems previously extended to this case is that of Kolmogoroff for problem 
(in). Extension from to Q 0 is of considerable practical interest, not only 
in the case of populations ordinarily considered discrete, but also as affecting 
the problem of the finiteness of the number of significant figures in measurements 
and the resulting occurrence of “ties” in ranked measurements. Before making 
these extensions we discuss in the next section the transformations on which 
they are based. 

3. Two useful transformations of random variables. We shall reserve the 
symbol X* for a random variable having a uniform distribution on the interval 
from 0 to 1. Its cdf is 

[0 if x* < 0, 

(1.4) U(x*) = Pr{X* < a;*} = \x* if 0 < .r* < 1, 

11 if x* > 1. 

3 For bibliography see [41. 

4 The notationPr(ft | F 0 ) denotes the probabilityof the relation R being true, calculated 
under the assumption that the cdf of the population is F 0 (x). 
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The device of transforming from any random variable X with cdf F in tis 
to one with cdf U was early used by Karl Pearson and more recently by many 
others; it is known in the literature as the “probability integral transformation.” 
We define the transformation x* = h F (x) as follows: For — oo < x < +<*>, 
h F (x) = F(x), h F (+ oo ) = + =c, h F (— oo) = — qo . If F is in tia, the following 
statements are evident for the transform X * = h F (X): X* has U(x*) as its cdf. 
With X* * h F {Xi), a random sample E = (Xi, • • • , X n ) from F transforms 
into a random sample E* = (Xf, • • • , X*) from U. The order statistics 
{ Z {} of E transform into the order "statistics [Z* j of E* with Z* = h F (Zi ), 
i ~ 0, 1, • • • , n + 1. 

It is easily seen that if F is not in ti 2 , the above transformation Y = h f (X) 
does not give Y the cdf U \ indeed, if F is not in Q 2 , the cdf of any single-valued 
function Y of X is also not in ti 2 , for there will be at least one point x *» Xq with 
positive probability, and likewise for its transform y 0 . Nevertheless our argu¬ 
ments in section 4 depend on relating a random variable with arbitrary cdf F in 
Qo to the uniformly distributed X*. While it is not possible to transform from 
X to X*, without introducing a further random process, it is possible to transform 
directly from X* to X. This suffices for our needs. We shall always denote 
this transformation by X — g F (X*). The following definition of the function 
x = g F (x*) makes it independent of the normalization of F at its discontinuities: 

(1.5) F(x - 0) < U(x*) < F(x + 0). 

A sketched diagram may aid the reader in following the argument: To every 
x* (— ao < x* < + oc) there corresponds at least one x, and this x is unique 
unless it lies in an interval to which F assigns zero probability. In the latter 
case we shall assume that some x in the interval is designated to be g F (x*). It 
will be seen that it is immaterial which x is thus chosen. However if x = — » 
or + x is in an interval of constancy of F we specify g F { — x) = — x, g F (+ x) = 

+ X. 

To prove that ^(A r *) has the cdf F(x) and thus can be identified with X , it 
is sufficient to prove that Pr{g F (X*) < x) = F(x + 0). Now g F (X*) < x if and 
only if X* < x* , where 

x+ — sup X*. 

Hence Pr\g F (X*) < .c) = Pr\X* < 4} = f/(4) = + 0). It follows 

that a random sample E* from U transforms into a random sample E from F. 
The transformation preserves the relation that is, if x a = ^(x*), x& = 

g F (xt), then xt < x* implies x a < x b . This means that the order statistics 
{Z *} of E* transform into the order statistics {Z,} of E. We remark that 
x* < xt does not imply x 0 < x b ; there is trouble when xt < 0 or xt > 1, and 
more serious trouble if x* and xt both go into the same discontinuity of F. 
However, we shall need to utilize the fact that x a < x b implies x* < xt . 
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4. Extension to continuous cdFs. A sufficient condition on Ti and T* for a 
solution (1.2) of problem (ii) to be valid for all F in is clearly that the joint 
distribution of F(Ti) and F(7V) be independent of F in Q 2 . If Pr{F(Ti) — 
p | F} =0 (t = 1, 2), then (1.1) is equivalent to 

(1.6) Pr{F(Ti) <p< F(r,) | F} = 1 - a, 

and so a sufficient condition that a solution (1.1) of problem (i) be valid for all 
F in Sl 2 is again that the joint distribution of F(T\) and F(T 2 ) be independent of 
F in ik . We are thus led to consider sufficient conditions on a set T \, T 2 , * • • , 
T r of statistics, which will insure that the joint distribution of F(3\), F(jT 2 ), 
• • • , F(T r ) be independent of F in Sl 2 . 

Theorem: A sufficient condition for the joint distribution of F(2\), F(T 2 ), ■ • • , 
F(T r ) to be independent of F in Sl 2 is that the { Tj\ be a subset of the order statistics 
{ Z{\ of the sample . 

To prove the theorem it will suffice to show that the joint distribution of the 
set of n random variables F(Zi), F(Z 2 ), • • • , F(Z n ) is independent of F in ft 2 . 
Let the cdf of the joint distribution be 

(1.7) G,(*i, X 2 , • • • , Xn) - Pr\F(Z 0 < X x , - • • , F(Z n ) < X n | FJ. 

Employing the transformation x* = h F {x) discussed in section 3, we see that the 
above probability equals 

(1.8) * Pr\Z* < Xi < X„), 

where Z* , Z\ , * • •, Z*+i are the order statistics of a random sample E* from the 
uniform cdf U. But this probability does not depend on F. 

Since the existing solutions of problems (i) and (ii) are obtained by taking 
T\ and T 2 to be order statistics, we have validated these solutions for all F in 
Qt . That the existing solutions of problem (Hi) are valid for F in has been 
demonstrated by their authors; this is however also an easy consequence of the 
above theorem. The sufficiency condition expressed by this theorem together 
with a necessity condition of Robbins’ [2] may indicate a natural path to the 
formulation and solution of further problems of non-parametric estimation. 

From a theoretical point of view it is of interest to note that even in those 
pathological cases where no probability density function exists for the cdf F 
in Q 2 (F is non-absolutely continuous), the joint distribution (1.7) of F(Zi), 
F(Z 2 ), • • • , F(Z„) always possesses a density. That this density is n! for 0 < 
F(Zi) < F(Z.) < • • < F(Z„) < 1, and zero elsewhere, is evident if we consider 

(1.8) . By “integrating out” the other variables we are led to the following 
practically useful result (it is well known for F in Q 4 ): Choose any set jr ; -( 
of 8 integers (1 < n < r 2 < • • • < r, < n), and consider the joint distribution 
of F(Z fI ), F(Z ra ), , F(Z r< ). This has a probability density function f(ti, 
hi * • * , U)i providing F is in , given by the formula 
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„ ™ , .. , \ _ nltV-'a - <.) n_r - TT (U+i - U) r<+l ~ ri ~ l 

(1.9) •••,«.) II (rm - r ,-i„ 

for 0 < t\ < t% < • • • < U < 1, and / = 0 elsewhere. As is conventional, the 
o 

result of applying JI is to be interpreted as unity, and the meaning of / is 

»-L 

given by 

Pr[F{Z u ) <aS - 1 , 2 , •• •,«)!?} 


n a 2 /*a« 

••• f(h,t i ,---,t.)dt.---dUdh. 

to J—so 


5. Extension to discontinuous cdf’s. Suppose we have a solution of problem 
( i) based on order statistics and hence valid for F in , say, 

(1.10) Pr{Z k <q p <Z t \F\ = 1 - a, 

where Q<k<t<n+l. In particular this is valid for the uniform case, 

(1.11) Pr{Zt < V < Zf} - 1 - a. 

We now transform from the uniform cdf U to an arbitrary F in by means of 
the transformation x = g F (x*) described in section 3. Suppose q p is defined 
by q P = Qp(v)- This means the quantile q p of the distribution with cdf F is 
determined from the relation 

f(qp - 0) < v < F(q? + o), 

which assigns to the quantile its usual meaning if F(x) is continuous and non¬ 
constant at x — q v , and a sensible definition if F is discontinuous or constant 
at q p . From the discussion in section 3 we have 

C Z k < q P < Z t ) implies (Zt < p < Z*) implies ( Z k < q P < Z t )> 

and hence the probability relations 

Pr\Z k < q p < Z,\F\ < Pr\Z* k < p < Z* t \ < Pr{Z k < q v < Z, \ F}. 

Substituting (1.11), we have 

(1.12) Pr{Z k < q p < Z t \F\ < 1 - a < Pr{Z k < q p < Z t \ F\. 

The statistical interpretation of (1.12) is the following: Consider any solution 
(1.10) of problem (i), giving a confidence interval for the quantile q P , valid for F 
in Q 2 . Then with the same values of n, k , t, and a, the probability of the random 
interval from Z k to Z t covering the unknown quantile q p is < 1 — a for the open 
interval, >1 — a for the closed interval, no matter what the unknown cdf F, 
If F is continuous, the two probabilities are of course equal. 
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To extend the solution of problem (ti) to the general case F in flo, suppose we 
have a solution (1.2) using order statistics, say Ti = Z k , T 2 = Z t (0 < k < t < 
n + 1). Such a solution will be valid for all F in , in particular for F = U, 

Pr[U(Z*) -f V(Z*) > 5) = l-«. 

Given now any arbitrary distribution F, we again use the transformation x = 
g r (x*). From (1.5), 

F(Zi - 0) < U{Z*) < F(Zt + 0) (i = k,t). 

Hence 

B- < B* < B + , 

where 

B ._ = F(Z , - 0) - F(Z k + 0), 

B* = U(Z*) - U{Zt), 

B + = F(Z, + 0) - F(Z k - 0). 

The implications 

(F- > b) implies ( B* > b) implies (J5 + > b) 
yield the relations 

Pr{F_ > b} < Pr{B* > b} < Pr{B + > b\. 

These may be written 

(1.13) Pr{F(Z t - 0) - F(Z k + 0)>b\F}<l-a 

< Pr{F(Z t + 0) - F(Z k - 0) > b\F\ 

To interpret (1.13), let us say that a Borel set S covers a proportion n of a 

population with cdf F(x) if / dF(x) = it. If S is an interval from x' to x", 

then the proportion covered by S is F(x" + 0) — F(x' — 0) if S is closed, and 
F(x" — 0) — F(x' + 0) if S is open. The proportion covered by a point x 0 
is the jump F(x 0 + 0) — F(x 0 — 0) of the cdf F at x 0 . The statistical meaning 
of (1.13) is now clear: For the random interval from Z k to Z t , the probability 
that the open interval cover a proportion > b of the population is < 1 — a, the 
probability that the closed interval cover a proportion > b of the population is 
>1 — a, regardless of the population. Again, for a continuous F the two 
probabilities are equal. 
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ON A TEST FOR RANDOMNESS BASED ON SIGNS OF DIFFERENCES 1 

By Henry B. Mann 
Ohio State University 

1. Introduction. It has been pointed out by J. Wolfowitz [1] that we cannot 
expect a test for randomness to be most powerful with respect to every possible 
alternative. It is therefore necessary to find tests designed to distinguish a 
random sample of observations from the same population from a sample coming 
from some particular class £2 of distributions. Such a test need be consistent 
in the sense of Wald and Wolfowitz [2] only with respect to alternatives in the 
class 12. 

Let Xi be the measurable quality characteristics of n units of a 

manufactured article. We shall assume that the distribution of is continuous. 
According to Shewhart the production process is termed “under statistical 
control” if X\ , • • • , x n can be regarded as a random sample of n independent 
items each coming from the same population with known or unknown distribu¬ 
tion function. 

In a random sample p, = pfe > x i+ i) = J, where P(E) denotes the prob¬ 
ability that E will hold. The class 12 of alternatives winch we shall consider is 
described as follows. The cumulative distribution of Xi is /,• and the /,•, i — 
1, 2, • • • , are such that 

t—n 

p» ~ ^ 2 = ^n(n l)j lim inf Xn == X ^ 0. 

tail n—»ao 

Such a situation may, for instance, obtain of the production process is under 
statistical control except for occasionally but not too infrequently occurring 
periods during winch the quality of the product decreases, after which decrease 
statistical control is immediately restored. If the decreases in quality are sharp 
enough or the periods of decrease long enough, then the alternative will belong 
to the class 12 described before. 

To give a practical example; consider a drill, which after some period of use will 
wear off so that the quality of the manufactured article will decrease until the 
drill is exchanged. After replacement of the drill by a new* one, statistical con¬ 
trol is immediately restored. Now, if the drill is not replaced in time, the 
periods of decrease in quality will be long and the rate of decrease will become 
rapid so that the sequence of distribution functions will satisfy the conditions 
of the class £2. A similar situation occurs also in time studies. For instance, 
in the foregoing example, the time necessary for drilling one hole will tend to 
increase when the drill is too long in use. 

The following test first proposed by Moore and Wallis [3] for the study of 

1 Research under a grant of the Research Foundation of the Ohio State University. 
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economic time series seems appropriate for our purpose: Let x \, • • • , x n be the 
sample and form the sequence x 3 — x \, • • • , x n — £n-i. Let S be the number 
of negative differences in this sequence. Clearly, the distribution of S is in¬ 
dependent of the distribution of provided the sample is an independent random 
sample from a continuous distribution. Under one of the alternatives of the 
class Q, S will in a sample of n tend to be larger than in a random sample if X n > 0. 
Hence S may be used as a statistic to distinguish between randomness and any 
of the alternatives of the class Q. The distribution of S was tabulated by 
Moore and Wallis [3] for n < 12. They also found empirically that S approaches 
a normal distribution. The asymptotic normality of the distribution of S 
can be proved rigorously in a way analogous to the proof of Theorem 1 of a 
paper by Wolfowitz [4]. The first four moments of S were obtained by Moore 
and Wallis. The fourth moment, however, only by empirical methods. In 
this paper we shall derive a formula which makes it possible to compute the 
moments of S recursively. With the help of this formula we shall indicate an 
alternative proof of the asymptotic normality of using the method of moments. 
Finally, we shall derive a lower bound for the power of the S test with respect 
to alternatives in 12 valid for large n and depending only on X n . 


2. The moments of S: Let P n (S) be the number of permutations in n variables 
with S negative differences. MacMahon [5] has shown that 

(1) P„(S) = (S + l)Pn-l(S) + (n - S)P,US - 1). 


Using (1) Moore and Wallis [3] have tabulated PI 




> 3 - 


In using their table for our purpose, one has to keep in mind that we are using 
a one tail region; therefore P(S > B) is for S > —~ — one half of the value 

a 

tabulated by Moore and Wallis. 

u _ I 

Clearly the first moment of S is —-—, since the expected value of — signs 

Ju 

equals the expected value of + signs. To find higher moments we multiply (1) 
by (&- divide by n! and sum over S. Then we obtain 

W *• [( s - nr)'] -; *-■ [( s - (S + ’>] 


where P n [/(S)] denotes the expectation of f(S) in permutations of n variables. 
From (2) we have 
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Putting S — E(S) * x we obtain 



(3) E n {x‘) - I E n . x [x(x - i)' - x(x + *)*'] + iE^ [(x + + (x - *)V 


From the symmetry of the distribution as well as from 3 it may be seen that 
all odd moments are 0 and therefore 


\E[{x + + Or - If ] = E(x + if 

E[x(x - J) 2< - x(x + *f ] = -2 E[(x + *f +1 ] + E(x + *f. 
Hence we obtain from 2 

(4) £n(x 2,+1 ) =0, i = 0, l,-- 

ft 

E n (x«) = E n —i [(x + if] - - £._J(x + if +1 ], 
n n 


If all moments below the 2ith moment are known (4) becomes a difference equa¬ 
tion whose solution yields the 2it\\ moment for n > 2i. Thus one obtains 


j. iq\ - t? _ 71 + 1 1 7 _ 5(n + l) 2 2(n + 1) 

^ftw) — E n {x ) — ^2 j E n {x ) — > 

i? /^ 6 \ 35(n ■+" 1)* ““ 42(n + l) 2 + 16(n + 1) 

E n (x) -^'-• 


E £ 2 * 

It is not difficult to prove from (4) by induction that lim-^— = (2 i — l)(2i — 3) 

n-*oo or n (j§) 

• • • 3.1. To do this one proves first by induction that E n (x 2t ) is for n > 2i a 
polynomial in n of degree i. It can then be proved by induction that the first 
coefficient of this polynomial is (2 i — 1)(2 i — 3) • ■ • 3.1/12* from which the 
assertion follows. Since (2i — 1) • • • 3.1 are the moments of a normal distribu- 

(s-^i)Vl2 

tion with variance 1 it follows that -- 7 = == - is in the limit normally 

Vn + 1 

distributed with mean 0 and variance 1. This result follows, however, also 
easily from Theorem 2 of a paper by Wolfowit z [4]. 
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It is also possible to show by induction from equation (4) that for n > 2i the 
2<th moment of Sis smaller than the corresponding moment of a normal distribu- 
n + 1 


tion with variance 


12 


3. The power of the S test. Let us assume now that one of the alternatives 
of the class Q is true. This is to say pi = P(x t - > x<+i) = £ + €<, 
2 € *‘ — ^n(n — 1), lim inf X» » X > 0. Let 

1 if the zth sign is —, 

0 if the ith sign is +. 


We shall show that 


We have 


P(Zi+l - 1 | - 1) £ P &+1 = 1). 


r r df 2 (x 2 ) r dux*)] r df 2 (x 2 ) < r^) f #«<*) r #>(*) 

I JL-oq JL m J Jjj JL_ oo *x\ •*—oo 

< [ dft(x 2 ) F [ df 2 (xi) [ dft{Xi). 

Adding f df 2 {x 2 ) | f df 2 (x 2 ) f df 8 (x 8 ) to both sides of this inequality we 
A-oo L.J— 80 J-00 J 


have 


f 1 df t (x 2 ) r df»(x») < r dMxt) r r°° #,(«o r <%(*,). 

J—oo J—ao J— oo |_**— oo •*— oo 

Integrating both sides with respect to x t , we obtain 
f dfi(xi) f dfi(Xi) f df 3 (x 3 ) 

J— CO J—K J— oo 

< Jj[ dfi(xt) d/ 2 (x 2 )J|^£ j2/ 2 (a: 2 ) d/»(x,)J 


or 


P(zi = 1 and z 2 = 1) < P(zi = 1)-P(z 2 = 1). 


From this it follows that <r ZiZi+l < 0. Since a 2 ti = \ — e 2 we have a\ < 

Z-r 1 «< ^ ^ ~ 4X *)- Moreover F(S) = ^-y- 1 + X n (n - 1). 

Let X' *= X if X < £ and 0<X'<XifX = £. The critical region i s for suffi¬ 
ciently large n given approximately by S > 71 + t ii , where t 
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depends on the level of significance a and must be chosen so that 

e"* 4 * 1 dx = a . Hence, if we can show that under any alternative H of the class 
fi and for any € > 0 

(5) P(S > E(S) - i V(n - "lj"(i - 4X*)) < £* dx + e . 

for every t > l > 0, n > N(t, H, T), then we shall be able to give a lower bound 
for the power of the S te st. Th e power of the S test is approximately given 

by p(s >”-=-*+.< 4 /sE). 

From (5) we have 


, t y/n + 1 - 2X n (n — 1) V3 

lor - v ■ ========== :-= - 

\/3 in - 1)(1 - 4X' 2 ) 


' i ** dx — e 


Vn+l~2X n (n~l)\/8 
V (8n-IHWV*j 


< — t < 0, n > N(e y H , Z). 


The author considers it safe to assume that (6) holds with a fairly small e for 
n > 12 if X' in (6) is replaced by X„ where X» = X n if X„ < £ and X» < % if X n — £ 
and if X' n is not too close to He bases this belief on the rapidity with which 
the distribution of S approaches normality under the null hypothesis of random¬ 
ness, and on the fact that at least under the 0 hypothesis the moments of S are 
smaller than the corresponding moments of a normal distribution. It may also 
be seen from the following derivation of (6) that in many cases the power of the S 
test will be considerably above the lower bound given in (6). 

To prove (5), we need the following two lemmas 

Lemma 1. Let P(x ^ 0 = f(t). Let further E(z) = 0, E(z 2 ) = c. Then for 
every 8 > 0 

(7) f(t + 5) + ~>P(x + t <t) Z fit -*)-}• 

Proof: Applying Tschebycheff’s inequality we have 
P(x + z < t) < P(x < t + 6) + Piz > t + 8 and z < -8) 

<Pix<t + S) + Piz < -8) < fit + 8) + L 


Pix + z < t) > Pix < t — 5 and z < 8) 


> P(x < t - S) - Piz > 8) > f(t - 8) - 
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Lemma 2. Let {x<} , i = 1,2, • ■ • be a sequence of independent random variables 
with mean 0 hounded kth absolute moment , k > 2 , and variance <r\ . Let M > 0 

H***- 2 «... - - * . «!+••• +Xn 


and lim sup =—* ^ M . Form the sequence of random variables y n * 

»—*oo H 

iften /or any « > 0 and any t > l > 0 


M Vn 


( 8 ) 


P(Vn < ~ t) < 


i rv- 

\/2r JLoo 


dx + € for n > N(e, Z). 


Proof. Form a sequence m« with lim m a * 0. Let 


where 



- 22 a *i 


y? denotes summation over all i for which <r< > m« and all sums extend from 
one to n. 

Let /£ be the distribution of x a n then by Lemma 1 


m-t + «) + || > P(y. < -0 > /“(-< - «) - |f|. 


Now we distinguish two cases. 

1st Case. The number of integers i with a] > m a is for some a of order n. 
In this case {/") differs arbitrarily little from a sequence of normal distributions 
with mean 0 and the upper limit of the variances at most 1 . 

2nd Case. The number of integers i with a] > m a is for every a of smaller 
order than n. In this case x a n converges stochastically to 0. In both cases 
(8) holds true since m a can be chosen arbitrarily small. 

We can now prove (5). It follows easily from TschebychefTs theorem that 
(5) is true if X = Hence we may assume X < Let z» be defined as at 
the beginning of this section. Form 

*_ 2(z, - E(z t )) 

Vi () '-4r*+, V(n - 1)(1 - 4X*)’ 

» _ 2fot - E(z ik )) _ t ~p,-> _ 2(Zj - E(z d) _ 

Ui V(n - 1)(1 - 4X 2 ) ’ Vn iJ &+: V(n - 1)(1 - 4X 2 ) 

where m! = gk is the largest integer multiple of k which does not exceed (n — 1 ). 
We form further 


x k n = 53 v) , z k — y! u k . 

J-l J-i 

Since 4 X 5 ) — jfc(l *- 4X 2 ) ^ ^°^ ows ^ rom F< EMMA 1 that 

2(8 - E(S)) 

the distribution of ^ differs arbitrarily little from the distribu- 
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tion of x\ for sufficiently large n and k. The second and the third absolute 
moment of s/n — 1 v) are bounded. Hence Vn — 1 v) fulfills the condi¬ 
tions of Lemma 2. The application of Lemma 2 yields (5) and conse¬ 
quently 6. 

The integer N(t, H, l) is independent of t provided the lower limit .of the 
integral does not exceed —l. Hence we have proved 
Theorem. Let h, t j, • ••beany sequence of numbers satisfying the condition 

_ j = t»y/n ~l~ 1 2(w 1)X« y/3 ^ j ^ ^ 

V(3 n - 3)(1 - 4X' 2 ) 

where = lim inf X n if lim inf X„ < j and 0 < X' < ^ otherwise. Let P n (S, H) 

«-* oo n-*oo 

be the power of t he S tes t with respect to the alternative H and critical region S > 
—2 +■ Thm 


(9) 


liminf[ W )/^/%^dx]>l. 


It is worthwhile to remark that (9) is sharp. That is to say there exist alterna¬ 
tives for which the left side of (9) is equal to ( 1 ). This is obviously the case 
for any alternative with P(x, > x i+ i) = J + X and P(z 4 = 1 and z,+i = 1) = 
P(zi = 1 )-P(z i+ , = 1 ). These conditions are, for instance fulfilled by the 
alternative given by P(x<+i = a — 5 — • • • — S') = $ + X, Pfe+i = C + 8 + 

28 

• • • + 8') - § — X, i = 1,2, • • • where (a — c) > ,-; > 0. 

1 — 0 


If t n = t for every n then (9) implies the consistency of the test if the order 
of A* is larger than \/y/n. It may also be seen that the test is not consis¬ 


tent with respect to alternatives for which A n is of order at most equal to - 7 =. 

Vn 

This remark refers of course only to alternatives for which is independent of 
Xj for i 9 * j . 
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THE ASYMPTOTIC DISTRIBUTION OF RUNS OF CONSECUTIVE 

ELEMENTS 


By Irving Kaplansky 
New York City 


In a permutation of 1, 2, • • • , n let r denote the number of instances in which 
i is next to i + 1 , i.e., in which either of the successions (i, i + 1 ) or (i + 1 , i) 
occurs. Thus for the permutation 234651, r = 3. In [3] Wolfowitz 1 has pro¬ 
posed the use of r for significance tests in the non-parametric case, and in [4] 
he has shown that asymptotically r has the Poisson distribution with mean 
value 2. It is to be noted that W(R ), the number of runs as defined by Wolfo¬ 
witz, is equal to n — r. 

In this note we shall derive more explicit results concerning the asymptotic 
distribution of r. In a random permutation (all permutations being regarded 
as equally probable) let the probability of exactly r successions as above be 
P(n, r), and let M(n , k) denote the fc-th factorial moment of the distribution, 
that is 

Mill, k) = 2 r r(r — 1 ) • • • (r — k + 1 )P(n, r ). 


We shall show that 


( 1 ) 

(2) 


M(n, k ) = 2* - 

P( n ; r ) = 2 -L- [l - 


k+\(k\k k + 2 (k\ k(k -• 1 ) 
2k \1 Jn + 2 *k \2j n(n - 1 ) ' 

r 2 - 3 t r* - 8 r 3 + 9 r 2 + 22 r - 16 ' 
2 n 8 n(n — 1 ) 


...] 

+ 0 (rT 3 ). 


Since 2* is the fc-th factorial moment of the Poisson distribution with mean 2, 
either of these results serves to verify the asymptotic Poisson character of the 
distribution of r. 

It would be possible to obtain some kind of explicit formula for the general 
term of ( 2 ), but there seems to be no reasonably simple form. 

Proof of ( 1 ). Let Ai denote the event “i + 1 comes right after t” and £,• 
the event “i comes right after i + 1 ” (i = 1 , • • • , n — 1 ). The joint prob¬ 
ability of k of these 2 n — 2 events is either 0 , if they are incompatible, 
or (n — k)\/n\ if they are compatible—for in the latter case we in effect assign 
positions for k of the elements and are then free to pewnute the n — k others. 
Let f(n, k ) denote the number of ways of selecting k compatible events. Then 
it is known that ([1], eq. (40)) 


(3) 


M(rij k) = klf(n, k)(n - k)\/n\ = /(n, k)/\ 



1 I am indebted to Dr. Wolfowitz for calling my attention to this problem, and to its 
identity with what I called the “n-kings problem” in 12]. 
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The relations of incompatibility can be summarized by the statement that 
Ai is incompatible with Bj if | t — j | g 1 . In view of (3), our task thus reduces 
to the proof of the following combinatorial lemma. 

Lemma. Suppose 2n — 2 objects Ai , • • • , A n -i , Bi , • • • , & n-~i are given . 
Let /(n, k) denote the number of ways of selecting k objects with the restriction that 
Ai and B s must not both be chosen when | i — j | jg 1 . Then 


(4) 


fin, k) 
2 * 


£(-!)' 


i—0 


k + 1 
2<Jfc 



Proof. We split the acceptable selections into two subsets: those which 
include -*l n -i and those which do not. Let the latter be g(n, k) in number. 
Since the selections which include A„_i must omit B„-i and B n -s, it is clear that 
they are g(n — 1 , k — 1 ) in number. Thus 


(5) f(n, k) = g(n, k) + g(n — 1 , k — 1). 

Similarly we split the selections which omit A„_i according as they omit or 
include J?»_i ; we obtain 


(6) g(n, k ) = f(n — 1, k) + g(n - 1, k - 1). 
Elimination of g from (5) and ( 6 ) yields 2 

(7) f(n, k) = /(« - 1 , k ) + f(n — 1, k — 1 ) + f(n - 2, k - 1). 

We can now make an inductive proof of (4). Assuming (4), we have 

f(n, k) - f(n - 1 , k) _ w , ^ k + 1 fk\ (n - i - l\ 

2 ‘ } 2 *k \i/\k - i - l) 


f(n - 2,k - 1) _ w 1N < k + i - 1 (k - l\ \(n - i — l\ (n 

2 w l) 2Hk - 1) V i ) \k 

_ (n - i - l\ Tfc + i - 1 fk - l\ k + i-2 

^ ’ \k - i- l)\2Kk - 1) A i ) 2 - 1) 



In view of the identity 

k + ifk\_k + i-l(k-l\ k + i - 2 /* - l\ 
k \ij k - 1 V * / + k - 1 V - l) 


we now readily verify that the right hand side of (4) satisfies (7). To complete 
the induction we must check the appropriate boundary conditions. According 
to (4) we have 




= 0 , 


/(n, 1 ) = 2 n — 2 , both as they should be. 


* This recursion formula is essentially the same as equation (20) in 12]. 
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Note. There are various other formulas for /(n, k ); we have selected (4) as 
it exhibits the asymptotic behaviour best. In an unpublished investigation 
John Riordan obtained a neat representation as a hypergeometric function: 

/(n, k) = 2 (n - k)F( 1 - k 9 1 + k - n; 2; 2) 

and derived corresponding recursion formulas. Essentially the same result 
was given by Wolfowitz [3]. Still another formula given by Riordan is 

'<»■*>- 2 §C 7‘X”-*■•)■ 

A symbolic version is given in §5 of [2]. 

Proof of (2). From the formula of Poincar6 ([1], eq. (29)) 

r\P(n, r) = £ (-1 ) k+r M(n, k)/(k - r)! 

A:—r 

or, in a cabalistic symbolic form, P(n, r ) = M r e~ M /rl We substitute the suc¬ 
cessive terms of (1) and we may let the sum run to infinity at a cost of 0(n~ m ) 
for any positive m. The first term contributes 3 

± (-l) fc+r 2V (k - r)! = 2 r £ (-2//*! = 2 r <T 2 . 

fc-r t-0 


Again since 

k 1 + k = (k - r)(k - r - 1) + (2r + 2)(ft -r) + r+ r, 
the next term yields 

£ (-l)‘ +r (ft 2 + fc)2*-7(ft - r)l = 2 r e -2 ^2 - 2r - 2 + 
and so on in obvious fashion. 

Some indication of the asymptotic behavior of P(n , r) is afforded by the fol¬ 
lowing table for n = 10. It is to be noted that, because of the form of (2), 
the approach to Poisson is much more rapid for r = 0 and 3 than for other r. 


r 

P (10, r) 

Poisson 

First two terms 
of (2) 

0 

.132 

.135 

.135 

1 

.300 

.271 

.298 

2 

.305 

.271 

.298 

3 

.179 

. 180 

.180 

4 

.065 

.090 

.072 

5 

.015 

.036 

.018 

6 

.002 

.012 

.001 

7 

.000 

.003 

-.001 


* My thanks are due to Mr. Riordan for correcting an error in this section, and for many 
helpful suggestions concerning the entire paper. 
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ON THE APPROXIMATE DISTRIBUTION OF RATIOS 

By P. L. Hsu 

National University of Peking 

The purpose of this paper is to apply Cramer’s theorem of asymptotic expan¬ 
sion 1 and Berry’s theorem 2 to study the approximate distribution of ratios of the 


following two types: 



a) 

Z = '-(¥>+ ■■■ 
n 

• +Y„)/!(*,+ — 

+ Xm) 

an 

Z = Y 

/I(X,+ ... +x m ) 

/ m 

= Y/X. 


In (I) the Xi , Yj are independent, the Yj are equi-distributed, 3 and the X* are 
equi-distributed and positive. In (II) X \, • • • , X n , Y are independent and 
positive, and the Xi are equi-distributed. 

1. The ratio (I). Assume that (II) the absolute fcth moment of X, and that 
of Yj are finite and positive, where k is a fixed integer >3, 

(12) the distribution of Xi and that of Yj are non-singular. 

Let 


f = V - f(Fy), - *(X 2 ) ~ t T = e(Y 2 ) - y 2 


and 


u = y^?(X -(), V = — (? - ri). 

a t 

Let F(x), G(x) and H(x) be respectively the distribution functions of Z, U and 
7. Let 


( 2 2 2\ i 

^+1) l 

m n / 


6 ‘ 


Then the relation Z < x is equivalent to 


x<jU_ 

by/m 


rV 


= < u. 


by/ n 


1 H. CramAr. Random Variables and Probability Distributions (1937), Chap. 7. 

* A. C. Berry. “The accuracy of the Gaussian approximation to the sum of independent 
variates”, Trans. Amer . Math. Soc., Vol. 49 (1941), pp. 122-136. 

! The Y,- are said to be equi-distributed if all Y/ have the same distribution function. 
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For simplicity we shall assume x > 0; the results are, however, general. Then 

the distribution functions of — : — 7 =- and ;— 7 = are 

by/m by/n 


Hence, by the theorem of convolution, 

(1) F(x) = /"{l - <?(- — y) )| dH. 

Here we recall the theorems of Cramer and Berry: Under the conditions (II) 
and ( 12 ) 

( 2 ) G(x) = m + k £^ + J^, 

m m 




p.(t) - Z <*»"“<«>, 

y-i 


and | D* | is less than a positive number which depends only on k and the distribu¬ 
tion of Xi . If k = 3, condition (12) may be removed . 4 
Analogously, 


_1_ V' Q*( X ) _L 

H{x) $(x) + Z ,/ 2 + „*(*— 2 )» 


<?,(*) = Z d j 9 *™\x). 

1 

In the sequel we shall use the letter A* to denote an unspecified quantity such 
that | A* | is less than a positive number which depends only on k, the distribu¬ 
tion of Xi and the distribution of Yj . 

Using (2) we have 

(4) 1 - G{-x) = «(x) + g 

and this making this substitution in ( 1 ) we get 




4 This last assertion constitutes Berry’s theorem. 
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and so by partial integration, 

F(x) = £ H ( bV ~ n{u T - »> ) ( 6 -^) 


£ H dP, ( b -^) 


2) * 


Making the transformation y — azv/b\/m and writing 


we get 


Fix) = £ H(du - OvWiv) dv + E £ //(aw - 
-7 I 1 A * 

U + tk m’« ’ + 

For Jo we use (3) and obtain 

h = £ Hau - Pv)&(v) dv + E y £ Q»(om - Pv)$'(v) dv + . 

For l v we use (3) with k replaced by k — v. Thus 

^ $(au - (3v)P',(v) dv + E ^ £ Qc(aw - &v)P',{v) dv + w)(t 4_ r) . 
Combining these results we get 

(6) F(z) = f $(au — 0v)&(v) dv + 2 ~~h f Q'i 0 ™ ~ &>)&(*>) duo 
JLoo rv u J-oo 

+ £ f &(otU - $v)P',(v) dv 

v-1 m v/J X-oe 

fc—3 Js—3 —v / _| \V i* oo 

+ L Z £j £- 2 / Qm(o« - i&OPfr) * + ft, 

y_i u«i m u n* u J-oo 


A* 

n i(*~2) * 


5>(cm — fiv)P' P (v) dv 


Ak , A* I 
m i(^-2) T" w J(fc-2) 


W >-/2 n i(i-2-v) 


/ jy j_V“ 2 

\\/m y/n) 


Now by (5), a > 0 and a — 0 2 = 1. For such values of a and 0, however, it 
follows easily from the theorem of convolution that 


f ' <f>(ct u — f$v)&(v) dv = 

J-8C 
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As differentiation under the integration sign is justified by the boundedness of 
the derivatives of $ we have 


00 

a p £ $ (p) (au — 0v)$'(v)dv = $ (p) (tt). 


Repeated partial integration then gives 


f $ (p \au — /3v)$ (9) (v) dv *= /3 9 1 [ $ (p + g J) (au — 0v)$'(v) < 

J—00 J—«Q 


Hence 


f Q,(au — fiv)$'(v) dv = dj P f $ (9+2i) (otu — 0v)&(v) i 
X-80 7—1 ■Loo 


"5 &*""<«>• 


/ oo r -00 

$(cm — pv)P'y(v) dv = 2 — /S*0$ U 

oo j««l •*— oo 


* /0*' +2 > y. 

E *-=£-* ♦"”(»), 


f QJau — 0v)P' y (v) dv = zt d» M c ; > f $ (ft+2j) (au — 0tO$ (H ‘* rfl) (tO dv 

•Loo i-l 7-1 J 


■§S 4ei ’^ 411 


Making all these substitutions in (0) we obtain the final result 


FM - *«) + § !=£ i, ♦«”■(«) + § £ £ •'*”<«> 


1 ^ dy. 


fc—3 fc—3 —v ( _ y\¥ H v 

+ E Z tw W2'“M/a£ 22 CjV Ttt+r+2*+2> ^ (M+r+2l+2,) (ft) 

*-i M -i m ft »-i i-l or 

/ 1 1 v - 2 

+ Hv^ + V»J • 


If k = 3, the result remains true without the condition (12). 

2. The ratio (II). Here we make the following assumptions: 

(111) The fcth moment of X» is finite and positive, where fc is a fixed integer 

> k, f(X,) = l, 5 «(X?) — 1 == Or 2 . 

(112) The distribution of X> is non-singular. 


* As the case < (X*) ■« 0 is excluded, there is no loss of generality in this assumption. 
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Let U = Vm(l — l)/<r, and F(x), G(x) and //(x) be respectively the 
distribution functions of Z, U and Y. Then 




Because of the positiveness of X{ and Y we may always assume x > 0. Then, 
by the theorem of convolution, 

f(x) = ri} - y " )l dH(y) - 

Using (4) we have 

/ \fm, (x - y (-l)" ( Vm(a; — y) 


*(*) = 




m^ 2 


’V 


<TX 


)} 




m' 


where, as throughout the rest of this paper, Ah represents an unspecified quantity 
such that | Ah | is less than a positive number depending only on k , the distribu¬ 
tion of Xi and the distribution of Y. By partial integration we get 


, _4*_ 


(7) F(x) = l" H(x -y) 4*(— ? ) + 2 ( 1) ' P ’(At) 

^°° I V CX / *'“ 1 “^ r /2 

An interesting special case is the following: Suppose that (113) 7/ a " > (j) 
exists and is continuous for all x > 0; (114) the functions 




(v = 1, • • • , lc — 3) 

are bounded, i.e. 

{»(*) = A* ; 

(113) there is a positive constant c < 1 such that 

x k - 2 H a ~»(y) = A* 

for all x > 0 and (1 — c)x < «/ < (1 + c)x. Under these conditions we have 
(-1 )VxVtf w (x) 


i‘-vh )-g 


j/J m F/2 


+ (* - 2)!m*<*- 2 > 77 V + Vm/ 


and so, for ] z | < C -Y™: we have 


(8) 


«*\ = v (^ljV&g , A tZ *- 2 
\ /to/ r—o vim" 11 m* (MI ' 



DISTRIBUTION OP RATIOS 


309 


Separate now the integral in (7) into two parts: 

h = / , /. = / 

•'1*1 ^ey/m/o •'III 




Now 


\h\<( ^ *'(«) +E 


(-iypl(z) 

m’ n 


dz. 


Evidently this last integral is exponially small and so is Ah/m vi ~ i) . By (8), 

-£® s ^H+g s =S^)*+sfe.- 

Combining these results we obtain 

** - £(§ b ^ £ )(*' w +§ 1 yf 5 4 

Jfc—S J y k —3 *—3 ji J h J 

r i V 1 V 1 V' C* » . 




J3 m'« 7,1 + h h mi 5+») + 

= E + E + 


^4* 




where 


/«« = f°z a ^(z)dz. 


Now the following facts can easily be established by means of partial integration: 

(9) l a 0 = 0 when a — 0 is even, 

(10) I a $ = 0 when . 0 — a > 1. 

By (9), the non-vanishing terms in^are the even terms and the non-vanishing 

i 

terms in are those for which n + v is even. Hence 
2 


E = 

i 

E = 


E 


g» fe y 


[*(*-3)] l*(*—3)] 2p . 

rV v e ^ v 

JLm- Jmmd LmJ 

F-0 m* 1 J-l W 


/2y, 


2M+2i+l 



g J>» j lP+l j 
m M+r+l ^ 2,,+1, 
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Using (10) to reduce further we get 
2 


E [*(*-*)] M2, . ttCfc-4)] i-l 2M+1 / t 

- JL L L TT^hT *2p,2m-W+1 -T L 2u L 

2 r-2 M-l ;-l 77T^ f-1 M -0 3-1 77l M+r+A 


y^ tin* s^y-H i y^ y^ ?2*+3 

Co m M+r+a Co Co 


[|(*-9)1 1 « [*(*-«)] 1 a 4 

“ 23 23 k*€20+4 + 2 23 ft«0&0+3 + ~4^ i) 

a—0 771®^ 0-U(a+l)l a—0 Wl m |8-[i(a+l) ] 

[!(*-«)] i t-3 [*(*-3)] 1 .*-2 a 

= 23 23 + 23 ~ 23 ^;&/+3 + » 

" m‘,-[j(7-2)] " w* ._riTr ,M 2) 


*-2 m l i)i 


Hence 


? + £- & + '-£ + + " '£’" i 


( r-3 »-2 , \ 

+ 23 h' ^2r-f-4 + 23 ^?2r+s)+ 

M—[f(V—2) ] M—[*(•'—2)] / 


fc-3 1 2v j 

fo + 23 —; 23 Vi'$3 + ~\Tk-€) • 

Cl m V jmmf+l m* Kk l) 


Hence 


F( x ) = Jo + 23 “ 23 Pj>fi + -J 73 


Our final conclusion is: Under the conditions (III)—(115) formula (11) is true; 
if k = 3, (11) remains true without the condition (112). 



ON THE DISTRIBUTION OF THE SERIAL CORRELATION COEFFICIENT 

By Herman Rubin 

Cowles Commission for Research in Economics , University of Chicago 

The distribution of the serial correlation coefficient, in samples drawn from 
a parent distribution with zero serial correlation, has been studied by many 
authors. Anderson [1] obtained the exact distribution. Dixon [3] and Koop- 
mans [4] have given approximate distributions, each attained by smoothing the 
characteristic values of the numerator of f in (1) below. Dixon smoothed the 
characteristic values in the generating function and obtained his results by 
comparing the moments of the exact distribution with those of the approxima¬ 
tion, of which the first T are found to be exact. Koopmans smoothed the 
characteristic values in the exact distribution function. Here we evaluate 
Koopmans* result and show that it is the same as Dixon\s approximation. It 
thus appears that in this case it is immaterial whether the characteristic values 
are smoothed before or after inverting the characteristic function. We also 
add Tables comparing confidence limits for the exact distribution, for the ap¬ 
proximation referred to, and for a normal approximation. 

We define the serial correlation coefficient as 

r 

X X,X,+1 

( 1 ) f = ---f— - , Xr-n = Xi. 

X xT t 
<•=1 

Then Koopmans obtains, if the true value p of r equals 0, and the x t are nor¬ 
mally and independently distributed with mean 0 and variance <r 2 , the ap¬ 
proximate distribution T/2 — 2. 

(2) 7i(f, T ) = —-- ——- / (cos a — r) },1 ~ 2 sin §Ta sin a da. 

7T Jo 

Although in the distribution problem T is a positive integer, it is useful to 
consider the right-hand member of (2) as the definition of Ti(?‘T) for those 
complex values of T for which it exists. 

Let R(T) denote the real part of T. If R(T) > 2 N + 2, we obtain 

fr N Mf, T) = (>r - 2 )(%T - 3) • • • {\T - N - 1) 

/ox dr N i r 

(cos a — r) tT ~ 2 ~ N sin \Ta sin a da . 

Now, according to [2], tables 41, 42. 

r »72 

I (cos a) iT ~ 2 ~* sin \Ta sin a da 

\Tv r(J7 7 - JV - 1) 

= 2 * t ~ n t $ {T _ N + i»r(i(i - N)) * 
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(4) 
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Deonote by S W) (0, T) the value of 7i(f, T) for f = 0. Then for R(T) > 
2N + 2, 


( 5 ) 


h iN) (0, T) 


(-if 2 Ar r(*r + i) 
r(*(T - N + i))r(j(i - N)) ‘ 


H(f, T) is analytic in f for | r | < 1, R(T) > 2, and is analytic in T for | f | < 1, 
R(T) > 2. It follows by Hartogs’s theorem [5] that h{f, T ) is analytic in f 
and T for | f | <1, R ( T) > 2. By analytic continuation we get that (5) holds 
for R{T ) > 2. Consequently 


(6) If N is odd, fc w (0, T) = 0; 

(7) if AT is even, 


fc (r, (0, T) _ 2 N r($T + i)r(i) 

K( 0 , T) r {\{T - n + D)r(%(i - N)) • 


Let N = 2P, then 
1 V w) (0, T) 


r - 2 P + 


( 8 ) 


(2P)! h(0, T) 

_ (T - 1\ (T - 3\ / 

(2P)! V 2 A 2 ) " V 

_ (-If /T - 1 Vr - 3\ /T — 2P + l\ (2P)! 

(2P)! \ 2 A 2 ) "\ 2 ) P! 




3 • • • (2P - l)' 


i T — 

piLtoi' 


(i - n 




•] . 


2 P 


According to (5) 

(9) MO, T) 


T(hT + 1)_1_ 

r(|r + i)r(i) d - n 


Hence 

( 10 ) 


Mr, T ) = 


r(ir + 1)(1 - 
r(4 T + 4)r(i) 


dr * 


which is the same as Dixon’s expression (3.22). 

A more elementary proof by complete induction for integral values of T can 
be based on the recurrent differential equation (14) which is of interest in itself. 
To this end we shall write (2) in a different form which is easily obtained through 
partial integration. 

O^.lT 7 T ar c c°a r lm « 

(11) Mf, T) = —(cos a - f) ,r ~ l cos \T a da. 

IT Jo 
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Differentiating with respect to f, 


l'{f, D = - * r( * r ar — (cos a - cos i Tada. 

= - iT(Fr D2 >r r - f (C08tt _^ (cosi(r _ 2)acos « 


sin J(5T — 2) a sin a) da 


17 Y 17 7 _ i \oi^ /*»rc cos r 

I—*— - (cos a — F) ifwl cos i(T — 2)a da 

171/177 /»* ro cos * 

4 (4 ^ [ (cos a - f) Jr - 2 cos J(r - 2)a da 

IT JO 

W-i)2 ir r“'^ -njt-2 


,lT /*»rc cob f 


*»rc cob r 

/ (cos a — f)* 

Jo 


•sin £(T — 2)a sin a da 


inir - i)2 4 f 


\T - r arc cob f 


/»»rc cob r 

/ (cos a — r) 1 

Jo 


•cos J(T — 2)a da, 

because the first and third terms in (12) cancel as may be shown by integrating 
by parts. 

Hence (13) reduces to the recurrent differential equation 
(14) P(f, T) = -2-*7YE(f, T - 2). 


Let us now assume that 


*"• r - 2 > - mr?W) (I - 


Then (14) becomes 


Trs rp\ _ \rph(T ““ 1) r(|T) n ^iT-i 

Mr, T) - -2r ;T (1 - r ) 

(16) 

_ os \ fl 1 i\ L(§!T “|“ 1) f-y -2^i(r—l )—1 

- - 2r J(r ^riFfTW)' 1-0 

Integrating, one obtains 

(17) ^ (r ’ T) = iW ~+ T ) r(i) (1 “ r2),( 

No constant of integration occurs because (17) agrees with (5) for f = 0 and 
N = 0. 
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It remains to prove the validity of (17) for the initial values T = 3 and T » 4. 
If T = 4 


( 18 ) 


Mr, 4) = - f 

IT JO 


sin 2a sin a da 


8.3 [' 

= 3^ 8m a Jo 


ftrc cos f q 

= 1^-^ = 


r(3) 

r(»r(i) 


(l - 1 *)“*-" 


For T = 3, 
(19) 




“ rc '°' r sin fa sin a 


a — f 


da. 


( 20 ) 


Substitute cos a = f+ (l — f) sin 2 0. We get 

K(f, 3) = — -- f {(1 + 2f) cos 2 0 + 2(1 — f) sin 2 0 cos 2 0} dd 

7T Jo 

r(f) 


- id - n = 


r( 2 )r(i) 


(i 


f 2^(3-0 


which completes the proof. 

A short table of confidence limits is included, corresponding to the 5% and 
1% significance levels, comparing the exact distribution given by Anderson [l] 
(the values in parentheses being graphically interpolated by him), the distribu¬ 
tion (10), and the normal curve with the same mean and standard deviation. 


Confidence limits for f 


T 

5% 

1% 

Exact 

(10) 

Normal 

Exact 

(10) 

Normal 

3 

.864 

i .729 

.736 

.970 

.882 

1.040 

4 

.713 

.669 

.072 

.898 

.833 

.950- 

5 

.622 

.621 

.622 

.823 

.789 

.879 

6 

.570 

.582 

.582 

.702 

.750 

.823 

7 

.545 

.549 

.548 

.714 

.715 

.775 

8 

(.521) 

.521 

.520 

(.082) 

.685 

.736 

9 

.498 

.497 

.496 

.656 

.658 

.701 

10 

(.477) 

.476 

.475 

(.633) 

.634 

.672 

11 

.457 

.458 

.456 

.012 

.612 

.645 

15 

.400 

.400 

.399 

.543 

.543 

.564 

20 

(.351) 

.352 

.351 

(.480) 

.482 

.496 

25 

.317 

.317 

.317 

.437 

.437 

.448 

30 

(.291) 

.291 

.291 

(.404) 

.403 

.411 

35 

(.271) 

.271 

.270 

(.377) 

.376 

.382 

40 

(.255) 

.254 

.254 

(.355) 

.354 

.359 

45 

.240 

.240 

.240 

.335 

.335 

.339 
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It is thus seen that the distribution (10) provides satisfactory significance levels 
for T > 9 whereas the normal approximation provides satisfactory 5% signif¬ 
icance levels for the same range. The normal approximation appears to be 
unsatisfactory, however, at the 1% significance level even for T as high as 45. 
The normal approximation here used is not the same as that used by Anderson 

v/ y* f 

([1], p. 53), which assumes ■ - 7 ...... = to be normally distributed. 

Vl + 2f 2 


The following table shows a comparison between a few more confidence limits 
of the Type II curve (10) and the normal curve with same first two moments 
for a few values of T. 


Confidence limits for f 


T 

5% 

4% 

3% 

2% 

1% 


(10) 

Normal 

(10) 

Normal 

(10) 

Normal 

(10) 

Normal 

(10) 

Normal 

15 

.400 

.399 

.423 

.425 

.452 

.456 

.488 

.498 

.543 

.564 

20 

.352 

.351 

.373 

.373 

.398 

.401 

.431 

.438 

.482 

.496 

25 

.317 

.317 

.336 

.337 

.360 

.362 

.390 

.395 

.437 

.448 
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NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items . 


A NOTE CONCERNING HOTELLING’S METHOD OF INVERTING A 
PARTITIONED MATRIX 

By F. V. Waugh 

War Food Administration, Washington 

Professor Hotelling recently presented several methods of computing the 
inverse of a matrix. 1 Among these was a method of partitioning a square matrix 
of 2 p rows into four square matrices, a, b , c and d , of p rows each, resulting in 
the partitioned matrix, 

C 3- 

The inverse of this matrix can also be written as a partitioned matrix, 



Then, multiplying the original matrix by its inverse we get four matrix equa¬ 
tions, 

aA -j- bB — 1 aC -f* bF) — 0 

cA -b dB — 0 cC + dD = 1. 

These equations can be solved for A, B , C, and D. 

Professor Hotelling’s solution requires the inversion of four p-rowed matrices. 
It is possible, however, to solve these equations by formulas involving only two 
inversions. The formulas are 

D = (d — caT'bT 1 B = - Dear 1 

C = -a~ l bD A = a" 1 - a~ l bB. 

As an example of the procedure let the given matrix be 


26 

-10 

15 

32 

19 

45 

-14 

-8 

-12 

16 

27 

13 

32 

29 

-35 

28. 


1 Harold Hotelling. “Some new methods of matrix calculation,” Annals of Math. 
Slat ., Vol. 14 (1943), pp. 1-34. 
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The necessary steps in computation are 


-i f .03309 .00735] a 

'6 = T .39345 

1.00008] 

1-.01397 .01912J 

L— .47723 - 

-.60000J 

-i I - -.62060 .21772] cd 

m “L -65375 .78968J 

= 1* -12.35708 

-21.60096' 

L -1.24927 

14.60256. 

Note that a convenient check at this point is to compute both 

(ca )b and c(cT'b) 



, -i, r 39.35708 34.60096' 

Ca —33.75073 13.39744. 


,, n f .00790 

(i -cab) -D = [ M991 

-.02041] 

.02322J 


-i. n „ [*—.02302 

bD - C = L -01572 

-.015191 

.00419J 


n -i D T .01825 
Dm B [-.00282 

.01440] 

— .02267J 


-i -hn , T .02873 
a -abB = A = [_ m% 

.024361 

.01239J' 



The last four of these matrices are the four parts of the inverse, which can be 
written 


.02873 

.02436 

-.02302 

-.01519 

-.00696 

.01239 

.01572 

.00419 

.01825 

.01440 

.00790 

-.02041 

-.00282 

-.02267 

.01991 

.02322 


The accuracy of the computations can be checked by multiplying the original 
matrix by the computed inverse matrix. The product should, of course, be a 
close approximation of the identity matrix. If further accuracy is called for 
we can use Hotelling’s iterative formula, 

Ci = C 0 (2 - .4C 0 ) 

where Co is the estimated inverse; A is the original matrix; and Ci is a second ap¬ 
proximation of the inverse. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Professor W. G. Cochran of Iowa State College has gone overseas as a con¬ 
sultant for the United States War Department. 

Professor A. R. Crathorne of the University of Illinois has retired with the 
title of Professor Emeritus. 

Professor William Feller of Brown University has been appointed Professor 
of Mathematics at Cornell University, Ithaca, New York, as of July 1, 1945. 

Associate Professor Joe J. Livers has returned to Montana State College at 
Bozeman after receiving his doctorate in February at the University of Michigan. 

Assistant Professor W. A. Yezeau of the University of Detroit has been ap¬ 
pointed Assistant Professor of Mathematics at St. Louis University. 

Associate Professor S. S. Wilks of Princeton University has been promoted to 
a professorship. 

The American Statistical Association elected ten Fellows during 1944. Of 
these ten, five are members of the Institute. They are A. E. Brandt, W. G. 
Cochran, Gertrude M. Cox, Alan Treloar, and Sewall Wright. The President 
of the Association is Dr. Walter A. Shewhart, a charter member of the Institute 
and its President during 1944. 


New Members 

The following persons have been elected to membership in the Institute: 

AUendoerfer, Asso. Prof. Carl B. Ph.D. (Princeton) Haverford College, Haverford, Pa. 
Beckstead, Lt. (j.g.) Gordon L. M.S. (Michigan) Aerologist,U.S.Navy. Aerology, Navy 
^151, c/o Fleet Post Office, San Francisco, Calif. 

Berman, Abraham J. M.A. (Brooklyn) Statistician. 1460 College Avenue, New York, 
N. Y. 

Bigelow, Julian H. Asso. Director, Statistical Research Group, Columbia University. 
401 West 118th St., New York 27, N. Y. 

Bowen, Earl K. A.M. (Boston) Instr. Math. Northeastern Univ., Boston, Mass. On 
military leave—Scientific Consultant, Office of Field Service, O.S.ll.D. 6 Sibley Ave., 
W. Springfield, Mass. 

Canter, Stanley D. B.S. (Coll. City of N. Y.) Statistician, Lerner Shops, Inc., New York, 
N. Y. 2676 Morris Ave., The Bronx, 68, New York , N. Y. 

Cohen, Karl. Ph.D. (Columbia) Physicist, Standard Oil Development Co. Esso Labora¬ 
tories, Research Division, P. 0. Box 243, Elizabeth B, N. J. 

Cooper, William W. A.B. (Chicago) Instr. in Economics, University of Chicago. 6539 S. 
Ellis Ave., Chicago 37, Ill. 

Davidson, James H. B.S. (Norwich Univ.) Research Physicist, Hercules Powder Co. 
Box 844, Christiansburg, Va. 

Epstein, Benjamin Ph.D. (Illinois) Staff Assistant, Westinghouse Electric & Mfg. Co., 
Quality Control Dept., Em. 3-A-17, East Pittsburgh, Pa. 
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Gauthier, Prof. Abel A.M. (Columbia) Prof, of Mathematics, University de Montreal, 
2900 Mount Royal Blvd., Montreal, Canada. 

Geraten, Lydia Blumenthal B.A. (Hunter) Res. Stat. 1001 Lincoln Place , Brooklyn IS, 
N. Y. 

Goffman, Casper Ph.D. (Ohio State) Staff Asst., Quality Control Dept., Westinghouse 
Elec. & Mfg. Co., Rm. 3-A-17, East Pittsburgh, Pa. 

Hastay, Millard W. B.A. (Reed) Asso. Math., Stat. Res. Group, Columbia University. 
401 West 118th St., New York 27, N. Y. 

Houseman, Earl E. M.A. (South Dakota) Head Sampling Sec., Stat., Division of Program 
Surveys, Bur. of Agric. Econ., Washington 25, D. C. 

James, R. W. M.A. (Toronto) Asst, to Director, Washington Div., Wartime Prices & 
Trade Board. Room 3068, Railroad Retirement Bldg., Washington, D. C. 

Jones, Robert Richard, Jr. A.B. (Columbia) 61 Jackson St., New Rochelle, N . Y. 

Kac, Asst. Prof. Mark Ph.D. (John Casimir Univ., Lwow) Math. Dept., Whitehall, 
Cornell University, Ithaca, N. Y. 

Knoepfel, Margaret F. A.B. (Brooklyn) Jr. Stat., Weather Bureau, Washington, D. C. 
SS06 Ely Place, S.E., Washington 19, D. C. 

Ladd, Robert Boyd M.A. (Texas Coll, of Arts & Industries) Stat. Consultant, OCT, 
Transport Economics, Traffic Control Div., War Dept., Washington, D. C. 908 Wade 
Ave., Rockville, Md. 

Larson, Charles M. B.Sc. (Nebraska) Stat. Analyst, Northrop Aircraft, Inc. 8144 West 
125th St., Hawthorne, Calif. 

Lesansky, William A. B.B.A. (City Coll, of N. Y.) Stat., War Dept., Washington, D. C. 

1841 Summit Place, N.W., Washington 9, D. C. 

Lewis, Wyatt H. B.S. (Calif. Inst, of Tech.) Quality Control Engineer. 212 East H 
Street, Ontario, Calif. 

Mathisen, Ensign Harold C. A.B. (Princeton) Ensign, USNR. 59 Fcrnwood Road, East 
Orange, N. J . 

Miller, Robert Carmi Res. Engineer, Elgin National Watch Co., Elgin, Ill. 

Mittra, Probodh Chandra B.Sc. (India) Grad. Student in Math. Stat., Columbia Uni¬ 
versity, New York 27, N. Y. 

Neumann, Prof. John von Ph.D. (Budapest) Institute for Advanced Study, Princeton, 

N J. 

Noland, Asst. Prof. E. William Ph.D. (Cornell) Dept, of Sociology & Anthropology, 
MeGraw Hall, Cornell University, Ithaca, N. Y. 

Okun, Yetta Edith B.A. (Hunter) Res. Asst., Dept, of Labor, Washington, D. C. 2120 
16lh St., A .H ., Washington 9, 1). C. 

Owen, F. V. Ph.D. (Wisconsin) Geneticist, U. S. Dept, of Agric. 1810 S. Main St., Sail 
Lake dig, Utah. 

Poston, Paul Lehman B.S. (California) Statistician. George Washington Carver Hall, 
211 El in St., Washington D. C. 

Rice, William B. A.B. (Davidson) Director, Dept, of Stat. & Reports, Plomb Tool Co. 
906 Baldwin Are., El Monte, Calif. 

Rudnicki, Alex. B.S. (City Coll, of N. Y.) Grad. Student in Math. Stat. 1072 Lorimer 
St., Brooklyn 22, N. Y. 

Rupp, William B. Mgr., Quality Control Dept., RCA Victor Div., Radio Corp. of America, 
Harrison, N. J. 29 Dodd St., East Orange, N. ,/. 

Savage, Leonard J. Ph.D. (Michigan) Res. Math., Stat. Res. Group, Columbia Uni¬ 
versity, 401 West 118th St., New York 27, N. Y. 

Sheppard, David B.S. (Yale) Statistician, Army Air Forces. 2721 Terrace Road, S.E., 
Washington 20, D. C. 

Smith, Prof. James Gerald Ph.D. (Princeton) Prof, of Economics, Princeton University. 
80 Murray Place, Princeton, N. J. 
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Stifler, Prof. George J. Ph.D. (Chicago) Prof, of Economics, Member, Res. Staff, Na¬ 
tional Bureau of Econ. Res., University of Minnesota, Minneapolis, Minn. 

Weingarten, Harry M.A. (Columbia) Math. Teacher, School of Aviation Trades. 1880 
Morris Ave., Bronx 56, N. Y. 

Weinstein, Joseph M.S. (C.C. N. Y.) Res. Analyst, Vacuum Tube Tests & Standardiza¬ 
tion, Camp Evans Signal Lab. Signal Corps. 18 Washington Village, Asbury Park, N . J. 

Westman, A. E. R. Ph.D. (Toronto) Dir. of Chem. Res., Ontario Research Foundation, 
43 Queen’s Park, Toronto 5, Canada. 

Wilcox, Sidney W. L.B. (California) Chief Stat., Bur. of Labor Stat. Room 2318, Dept, 
of Labor, Washington 25, D. C. 

Young, Captain Chen-Pang B.A. (National Tsing Hua Univ., China) Ordnance Dept., 
Chinese Army. 2811 Massachusetts Ave,, N, W., Washington 8, D. C. 

Corrections to the Directory Published in the December 1944 Issue 

The name of Dr. Walter Schilling was omitted from the Directory. It should 

have appeared as follows: 

Schilling, Walter M.D. (Harvard) Asst. Clinical Professor of Medicine, 

Stanford University Hospital, San Francisco 15, California. 

The name of Professor Godfrey H. Thomson, Director of the Training of 

Teachers, University of Edinburgh, Edinburgh, Scotland, was misspelled. 



CHOICE OF ONE AMONG SEVERAL STATISTICAL HYPOTHESES 

By Ralph J. Brookner 1 
New York City 

1. Introduction. Statistical decision is a term which we will apply to tjhat 
phase of statistical inference which deals with the following question. Con¬ 
sider one or several variates whose distribution function depends on one or 
several unknown parameters; suppose there be given a finite number of mutually 
exclusive hypotheses regarding the parameters, whose totality completely ex¬ 
hausts every possibility. If a sample of observations on the variates is made, 
the choice of one of the given hypotheses on the basis of that sample is called a 
statistical decision. In other words, to make a statistical decision is to give a 
procedure which will divide the sample space into as many regions as there are 
given hypotheses, and to set up a one-to-one correspondence between these 
regions and the hypotheses so that if the sample point lies in any particular 
region, the corresponding hypothesis is chosen. 

This notion is quite closely connected with both of the fields of statistical 
inference that have engaged most of the modem statistical theorists. On the one 
hand, it may be considered a generalization of the notion of testing hypotheses, 
for in this theory, one gives a procedure which divides the sample space into a 
region of rejection and a region of non-rejection of a given null hypothesis. 
Then one makes either of two decisions depending upon which of the regions 
contains the sample point. On the other hand, the theory of estimation is a 
generalization of the notion of statistical decision in which the number of alterna¬ 
tives is not restricted to t>e finite 

As in any phase of statistical inference, our primary aim is to define broad 
principles upon which “good” or “best” procedures for making statistical deci¬ 
sions may be based. The general problem of statistical decisions has been formu¬ 
lated by A. Wald, who has also proposed a principle on which the solution can 
be based. We are interested, however, in several of the simpler but important 
particular problems in which quite serious calculation difficulties are encountered 
in actually finding Wald’s, solution. Hence, we will propose in its stead another 
principle which quite closely resembles Wald’s for selecting a solution of the 
problem of statistical decision. 

It may be pointed out immediately that, from a purely logical point of view, 
the substitute principle we shall offer will probably be considered to be less 
acceptable than its predecessor. We will find, however, by considering its 
application to some of the well known problems of testing hypotheses, that the 
principle is at least reasonable in leading to certain well accepted results. 

1 Research under a grant-in-aid of the Carnegie Corporation of New York. 
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2. Principle determining the “best” procedure. We will first discuss briefly 
Wald's principle and the definition of the criterion that we will employ will be 
accomplished by pointing out the differences. A much more general formula¬ 
tion is possible [1], [2], but we will discuss the principle as it will be directly 
applied to the problems of statistical decisions when the number of hypotheses 
is finite 

Consider the variates Xi , x %, • • • , x P whose probability density function 
/(xi, x 2 , • • * , x p | 0i, 0 2 , • • • , 6k) is known except for the unknown values of 
the parameters 0i, 0 2 , • * ■ , 0*. We denote by 0 a point in /b-dimensional 
space whose coordinates are (0i, 02 , • • • ,0*) and shall speak of this parameter 
space as 12. Suppose that w is any subset of 12 and that S represents a system 
of finitely many such sets which are mutually disjunct and which cover 12. 
Each element, w 0 , of S corresponds to a hypothesis H uo , which is the hypothesis 
that 0 is a point of wo, and the system of all such hypotheses corresponding to S 
we denote by H 8 . 

A sample of N observations on Xi , x 2 , * • • , x p is drawn and the sample may be 
considered as a point, E, in the pN dimensional sample space; denote the sample 
space by M. We want to decide on the basis of the point E which of the hy¬ 
potheses of H s should be accepted. That is, we seek a procedure by which the 
sample space may be divided into a system of mutually exclusive regions M u 
which are the same in number as the number of elements of S , and by which a 
correspondence is set up so that the falling of the sample point into a particular 
M Uo shall cause us to accept a particular hypothesis H Uo as the true one. If 
the totality of regions M u be denoted M a , it is necessary to give a principle by 
which we may prefer a particular system M a over any other system Ms . 

Wald introduces the notion of a weight function of errors, a function of the 
parameters and of the decision made, which might well be defined as the loss 
incurred if 0 be the true parameter point and the sample point falls in M u which 
causes us to accept the hypothesis H w . Denote the weight function by W (0, «*) 
where w* stands for that hypothesis which we choose if E is the sample point; 
then we require that IT(0, w*) be non-negative, and if 0 lies in w* , W(d, w E ) =0 
for then the correct decision has been made and there is no loss. 

Perhaps the notion of a weight function can be most clearly understood, and 
its importance appreciated, if we consider the place of statistics in the business 
world, where possible losses are often computable in terms of money. The 
weight function may be taken to be equal to this loss. Suppose a manufacturing 
plant has a process which manufactures a product whose efficiency is a measurable 
quantity that we will denote by x. Suppose x is a random variable whose distri¬ 
bution depends only upon its mean value 0, and the company contemplates 
renewing its machinery if the mean value of the efficiency falls short significantly 
from a particular value 0o. Then on the basis of a sample of N observations on 
x, one of two decisions must be reached: the rejection of the hypothesis 0 0 O 
(the decision to renew the machinery), or the non-rejection of 0 ^ 0 O (the decision 
not to renew it). Suppose the region M u is the region of the sample space such 
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that if E falls into M u , we reject 0 ^ 0 O and M* is the complementary region. 
Then we may say that the weight function can be defined by 

W(6 t «) = 0 fcr 0 £ do 

W(0> w) = g(B) for 0 < 0 O 

W(By w) = 0 for B < Bo 

W(B t w) = h(B) for 0 0 O 

where h(B) is the company’s monetary loss in needlessly changing its machinery 
and g(B) is a function which expresses the company’s loss in not changing its 
process even though the true value of the parameter is B < Bo . The function 
g(B) may be of almost any form, but it is only reasonable that it should be a 
monotonic non-decreasing function of |0 O — 0], since the loss should, it seems, 
increase as the true value of B is farther from B 0 . 

Wald then defines the risk as the expected value of the loss; since B is an un¬ 
known, the risk will be a function of 0, and it will also be a function of the system 
Ms : 

r(By M B ) - f W(B t co s )-f(E\B) dE. 

According to Wald, the “best” system of regions, M s , is that system for which 
the maximum of the risk function with respect to the parameter B is a minimum 
with respect to all possible systems, Ms , of regions. Several important proper¬ 
ties are enjoyed by the system of regions defined in this way, though other 
reasonable definitions are possible. Perhaps the criterion of minimizing an 
average with respect to B of r(0, M s ) rather than the maximum may be con¬ 
sidered more plausible, but such definitions would raise the question of which 
average should be used, and the result obtained by using any particular average 
would not be invariant with respect to transformations of the parameter space. 

Using the notations as introduced above, and introducing the notation W(B, an) 
to be the weight function if the tth hypothesis is chosen, the principle which 
we will use to solve some of the problems of statistical decisions can be given as 
follows: In place of the risk function, we consider the 8 functions 

Rtf, E) ~ W(B y u>i)-f(E\B) (i = 1,2,...,*) 

where f(E | B) is a notation for the probability density, and s is the number of 
given hypotheses. If we denote by Ri(E) the least upper bound of B<(0, E) 
with respect to 0, then we choose the system of “best” regions of acceptance by 
including each sample point E in a region Mi determined such that for all E 0 in 
Mi , Ri(E 0 ) ^ Rj(E 0 ) for allj i. 

It is interesting to note that a rather general case exists in which the principle 
is exactly equivalent with the test of a hypothesis based upon the likelihood 
ratio principle. Consider the distribution function f{x i, , • • • , x p | 0i, 0 S , 

• • • , 0*) which is a bounded function of the x’s and 0’s. Suppose we are in¬ 
terested in the test of the hypothesis (0i, 02 , * • * , 0a) « a> where a? is a closed 
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set of points of the parameter space which does not contain any open subset of 
the parameter space. Furthermore assume that for each set of z’a the distribu¬ 
tion function is continuous in 0i, • • • , 0* on an open subset of Q containing 

We will show that the principle will lead to the test based on the likelihood 
ratio if the following is the weight function: 

I. If «is accepted, the loss is zero if the true parameter point is in <o, and the 
loss is a constant c x if the true parameter point is not in co. 

II. If <o is rejected (i.e. co is chosen), the loss is zero if the true parameter 
point is in co and is a constant c* if the true parameter point is in co. 

Consider then the region of the sample space for which co is rejected according 
to the principle. This region is that for which 

l.u.b. w.r.t. 0 in co of [crfix | 0)] < l.u.b. w.r.t. 0 in co of [cif(x | 0)] 

where we have set f(x | 0) = f(x x , Xt , • • • , x p | 0i, 0 2 , • • • , 0*), and where l.u.b. 
w.r.t. means “least upper bound with respect to.” But the left-hand member 
of this inequality is equal to 

c 2 R.u.b. w.r.t. 0 in co of f(x | 0)] 

and because of the restriction on co and the continuity of /, we can see that the 
l.u.b. of f(x | 0) with respect to all 0 in w must coincide with the l.u.b. of the 
function with respect to all 0 in ft, which is the total parameter space. Thus 
we have that the hypothesis co is rejected when 

c»[l.u.b. w.r.t. 0 in co of/(x ( 0)] < Ci[l.u.b. w.r.t. 0in 12 of/Or | 0)] 

or when 

l.u.b. w.r.t. 0 in co of f(x 10) ^ c x 
l.u.b. w.r.t. 0 in 12 of f(x | 0) <%' 

The left hand member of this inequality is the likelihood ratio statistic intro¬ 
duced by Neyman and Pearson [3]; hence our test is exactly equivalent with the 
likelihood ratio test where the size of the critical region is determined by c x 
and C 2 . 

We pose the following quite hypothetical example to show circumstances 
under which the principle proposed is reasonable. The principle does not 
exactly apply as it was stated in terms of probability densities and the example 
involves discrete probabilities, but the logic seems somewhat applicable. Sup¬ 
pose a game is played which consists of the player’s guessing the number of white 
balls in an urn known to contain 10 balls, each of which is either white or black, 
on the basis of a sample of four drawings with replacements from the urn. Let 
us assume that there are eleven mutually exclusive hypotheses (as to the number 
of white balls in the urn) to choose among, and the player must make a choice 
of one of them after observing the drawing w r hich can give 16 different results. 
Assume that the one who plays the game pays a banker a varying sum of money 
if he makes a wrong decision and that the banker has the privilege of choosing 
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the population (i.e. the number of white and black balls originally ip the urn). 
Now on the basis of the assumption that the banker knows the player’s decision 
function and will attempt to fix the population so as to make the player’s ex¬ 
pected loss a maximum, it is clear that Wald’s principle, which minimizes the 
maximum loss, leads to the best way to play the game. 

Now suppose that instead of one player making the choice among the deci¬ 
sions, we have 16 players participating in the game and the first player is to 
make the choice if, and only if, the drawing is WWWW, the second player if the 
drawing is WWWB, and so on, where W stands for the drawing of a white ball 
and B for the drawing of a black one. In this case, if player x assumes that the 
banker will try to choose 4 the population most unfavorable to him, then his 
decision function based on the new principle is the best method of play. 

Although the example indicates that in the usual case which would come up 
in practice, Wald’s principle would lead to the better procedure, since the 
statistician is usually faced with the necessity of giving a decision no matter 
what the sample point is, the new principle is useful since one may hope that in 
many practical cases the two principles will not lead to widely varying results, 
(Specially if the sample is large. 

3. Application of the criterion to the case of testing the mean of a normal 
distribution. Now we will show that the criterion will lead to the widely used 
test of “Student’s hypothesis.” Suppose x is known to be distributed normally 
with unknown mean n and unknown variance cr\ ()n the basis of a sample of N 
independent observations Xi , * 2 , • • • , x# , “Student’s V ’ is used to test the hy¬ 
pothesis n = 0 . If x is the arithmetic mean of the N observations and s 2 the 
usual sample estimate of the variance, then with t = \/N$/$ } the hypothesis 
is to be rejected if 1 1 1 ^ to where U is a critical value at some chosen level of 
significance a obtained from the distribution of t under the null hypothesis. We 
will use the notation w, for the set of points jx 5 ^ 0 and for the set of points 
n « °. 

We will consider the problem in reference to the particular weight function 
defined as follows: 

TF(m, a ; wa) = (m/ for /x 0 
W ( 0 , a; «i) - W 

W(m, V \ ^ 1 ) = for M 5** 0 

TF(0, <r; «*) - 0 

where as a matter of convenience, we will take k an even positive integer in order 
to avoid the introduction of the absolute value of n/a which is necessary if h 
is an odd integer. We also take h £ N. 

The density function of the sample of N observations is 

C 

ft *6 


r (L/2**)S(r Q -M) s 
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where C is a constant. Then the two functions Ri(6, E ) are 

** if m - 0 

Rx(6 > E) « 0 if /i ^ 0 

—CU2**)S(x.—ul* •/. 

if fi *» 0, 


JS) = ^ .e~ ait,t)s *“ 
C r 


J5) = ^ if M ^ o 


R*(0 9 E ) = 0 
To maximize #i(0, 2£)> we set 


afli( 0 , g) = r 

d<r L 






T JV-f3 


] 


C<f 


which gives 


hence 


£x‘ 

AT 


» ,»v _ -J. 

~ (sxD jAr e • 


To maximize h! 2 (0, E), we set 


ai?. 


£a_[. +5 *._„]«£ 


-(l/2<r2)5(ar a -/i) a 


and 


dcr 


^ - \-N - & + gfr-r-tfl 

r L <r 2 J 


-SCx. - m) 2 1 Cm* --(l/2<r 2 )S(ar a —iO* 


^JV+fc+l 


which give the two relations 


and 


Then 


or 


S - <S(x« - M ) 


— m(AT + k)S(Xe — n) = ft/S(x„ — m) 2 


M 2 - gi(l - fc/AT) - (k/N 2 )Sxl = 0 

which gives the maximizing value of 

* _ «(1 - k/N) ± V«*(l - A:/A0 2 + (4k/N 2 )Sxl 
M 2 
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and it can easily be shown that the maximum is reached for the value of p* 
using the + sign when x is positive and the — sign when £ is negative. We will 
carry through the case x > 0 only as the case x < 0 follows in a similar manner. 
We have 

C/En _ j(w+t) 

*(*) ” ( M *)*^+«[_s( Xa _ M *)]tur+U -e 

To find the region of the sample space for which we should accept the hy¬ 
pothesis p 0 (i.e. the critical region for rejection of the hypothesis p = 0) r we 
seek those points E for which R\{E) g Rn(E), i.e. those for which 

WN*» 

(Sxi) iK - (»*)"" +k> [-S(x a - „*)]* lw4 *> ' c 

or for which 

(m*) w v - k) [-S(x a - M*)]* (Ar+t) 

(&£)*" g c 


where c is a positive constant. Since both sides of the inequality are positive, 
this inequality is equivalent to 


( 1 ) 


(n*)"- k (n* - 2) N+k 
(Sxlf 


^ Cl 


where Ci is another positive constant. 
Now we consider the statistic 


, g N& _ N 

T N — 1 ~ Sx* a — Nx ~ Sxl/x - N 
from which we have 

Sxl/x 2 = (. N/T 2 ) + N. 

Also note that 

2(fi*/x) = (1 - k/N) + V(1 - k/N) 2 + (ik/N*)(Sxl/W) 

(and this is true whether x is positive or negative). Now we can write the criti¬ 
cal region (1) as 

(mV*) w -*(m*/* - 1)* + * 

(Sxl/xY ~ Cl 


or 

[i - k/N + VFl/^ r +Wf(iTi/n ] A ~*[i + i/?T A 

•[-1 - k/N + V( 1 - k/N) 2 + (4/c/A')(l + l/T*)]™ £ c s 

where c* is another positive constant. We denote the left side of this inequality 
by <f>(T 2 ), and it can be shown that 4>(T 2 ) is a monotone decreasing function of T 2 . 
Thus since the critical region is defined by the relation <f>(T 2 ) ^ constant and 
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the critical region using “Student’s l” is T 2 ^ constant, these procedures are 
exactly equivalent. 

4. A Problem in statistical decisions. The question which aroused the interest 
of the writer in statistical decisions is the following one of multivariate statistical 
analysis. Suppose X \, x 2 , • • , x p are known to be normally distributed with 
unknown means and unknown variances and covariances, and on the basis of 
a set of N independent observations, a test is to be made of the hypothesis 
E{x i) = E(x 2 ) = *••== E(x p ) = 0. Such a test may be carried out by using 
the generalized Student Ratio [4], and the hypothesis is either to be rejected or 
accepted as a whole. But consider the case in which the null hypothesis is 
rejected; it seems quite natural to ask for a more enlightening statement. Is it 
not possible to say that on the basis of the sample, the hypothesis should t>c 
rejected for x tl , Xi t , • • • , Xi k but not rejected for x ik+l , Xi k+2 , • • • , Xi p ? Thus 
we seek a division of the sample space into 2 P mutually exclusive regions, each 
of which will lead us to reject the hypothesis of zero expected values for a par¬ 
ticular set of the XiS and to ac<*?pt it for the remaining set . 

We will consider a solution of the problem in the case that the covariance 
matrix of the joint normal distribution is known, and will motivate that solution 
by considering first the case of two variables. 

Suppose that X and Y are normally and independently distributed with un¬ 
known means, a and 0, and with unit variances. The joint probability density 
function is then of the form 

f{X, Y) = (l/2ir) •e -ilur ‘' a)S+< . 

The set of hypotheses is given as follows: 

Hi is the hypothesis that a = 0 and p — 0 

H 2 is the hypothesis that and P = 0 

Hs is the hypothesis that a = 0 and p 0 

Hi is the hypothesis that a 0 and P ^ 0. 

We have a sample of N independent pairs of observations (A'* , Y a ) where a = 
1, 2, • • - , N ; then the density function in the 2 N dimensional sample space is 

(] /2 t )* -e~ m(x '~ a)2+<r '~ fi) * ] 

We seek the set of regions Mi , M 2 , Mz , Mi in the sample space which are 
chosen such that if the sample point E falls in Mi , we accept the hypothesis Hi . 
We take the following as the values of the losses if the w r rong decision is reached: 

I. If Hi is accepted, 

i) for any parameter point (a, P), the loss is a continuous function of 
(a + fl 2 ), say W(a + 0 s ), which is zero for a » p * 0, is differentiable, 
strictly monotonically increasing, and possesses a finite maximum 
when multiplied by the normal density function. 
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II. If Ht is accepted, 

i) for any parameter point (a, ft) except (0, 0), the loss is Wfj?) where 
W is the same function as above, 

ii) the loss is W\ if the true parameter point is (0, 0). 

III. If Hz is accepted, 

i) for any parameter point (a, ft) except (0, 0), the loss is W(a) where 
W is the same function as above, 

ii) the loss is Wi if the true parameter point is (0, 0). 

IV. If Hi is accepted, 

i) the loss is Wi if the true parameter point is either (a, 0) for a 0, 
or (0, ft) for ft ^ 0 

ii) the loss is W 9 if the true parameter point is (0, 0) 

where Wi , Wi , and Wi are constants subject to some slight restrictions which 
will be pointed out later. 

The functions #<(0, E) are then the following: 

Ri(0, E) - W(a 2 + £*)(?(«, ft) 

= 0 

Ri($, E) = W(J?)G(a, ft) 

- W X G( 0, 0) 

- 0 

R*(0, E) - W(a 2 )G(a, ft) 

= WiG(0, 0) 

= 0 

Ri($, E) = WiG(a, 0) 

= wm o, p) 

= W*G{ 0, 0) 

= 0 

where G(a, ft) is the normal distribution function 

+(y-0)*] 

x and y being the sample means. It should be pointed out that the use of the 
distribution of the sample means instead of the joint distribution of the observa¬ 
tions is justified since the sample means are sufficient statistics for the parameters 
a and ft. 

We will use the notation RiiE) to denote the maximum of Ri(B, E) with respect 
to a and ft, and it can easily be seen to be the maximum of two expressions which 
we will denote by 11(1) and 11(2) where 11(1) is the maximum of W(J?)Q(a, ft) 
and 11(2) is the maximum of WiG(Q, 0). Similarly, Ri{E) is the maximum of 
III(l) and III(2), and Ra(E) is the maximum of IV(1), IV(2), and IV(3), where 
these are the maxima of the two expressions involved in Rs(B, E), and the three 
expressions in Ri(B, E), respectively. 

We will first show that the function Ri(E) is a monotonic increasing function 
of (x 2 + y 2 ). We know that the maximum of R\(B> E) is reached for values of 


f or a + ft* v* 0 
for a = ft *= 0 
for ft ^ 0 
for a = ft = 0 
for a 0, ft = 0 
for a y* 0 
for a = ft = 0 
for a = 0, ft 0 
for a 0, ft = 0 
for a = 0, ft 3^ 0 
for a = ft *» 0 
for aft 0 
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a and P for which the partial derivatives of Ri(6, E) with respect to a and P 
are ssero, i.e., for whufti 

[^(a; - a) W(a 2 + /) + 2aW'(a? + f? )}G(a , P) 0 

and 

' r " V [N(y -&)W(« +-/S*) + 2f3W'(« + 0 ) — 0 

where W f (a + p 1 ) is the derivative of W(a 2 + p 2 ) with respect to (d 2 + p 2 ). 
Since G(a, p) 9* 0, and W'(a + p 2 ) t* 0, these relations imply ,. 



or px = ycty. Thus the maximum of the function Ri(6, E) occurs for values of 
a and P which satisfy the relation a = ( x/y)p . 

Consider any two straight lines a = (x f /y')P and a *= ty'/u tf )P, and:the 
values of the function R\{6, E) along these two lines. Obviously, the values of 
the first factor W(a + P 2 ) are equal for points along the lines equidistant from 
the origin. Also, if the values of x\ y\ x", and y" are such that x ' 2 + y f2 = 
x " 2 + y" 2 , the values of the function G(a, p) along both lines are equal for points 
equidistant from the origin, and it follows that R\(x', y f ) — Ri(x", y"). Thus 
we have that Ri(E) is a function of ( x 2 + y 2 ). 

Note that if the value of x" 2 + y" 2 is greater than the value of x' 2 + y' 2 , the 
curve representing the function G(a, 0) along a = (x"/y")P is the same as that 
along the line a — (x'/y')p, but it is shifted further from the origin. The values 
of W(a + p 2 ) are independent of x and y and the function is monotonic in 
a 2 + P 2 . Thus, the value of G(a, p) for which Ri(6 , E) is a maximum on a — 
(%'/y')0 multiplies a larger value of W(a 2 + p 2 ) than on a = (a ;”/y")p, so the 
maximum when x n2 + y" 2 exceeds x' 2 is the greater. But this proves that 
Ri(E) is monotonically increasing in ( x 2 + y 2 ). 

In a similar manner, we now proceed to show that 11(1) is a monotonically 
increasing function of y . We know that a necessary condition for a maximum 
of 11(1) is that 

*11(1) = *H(1) = 

da dp 

The first of these two relations is 

W(tf)N(x - a)G(a, 0) = 0 

which has the solutions W(f?) = 0 and a = x. But W(jf) = 0 only for p = 0 
and this value rs a minimum of 11(1), hence we have that the maximum is reached 
for a x, so 

11(1) - max. of 

fi 

But along any two lines a = constant in the (a, p)- plane, the function W(p 2 ) 
has identical monotonically increasing values in p 2 and the normal density 
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m 


function is identical along two such lines for a fixed value of y\ An increase in 
the value of y 2 displaces the normal function from the origin but does not affect 
its shape, hence the value of the normal density function at which 11(1) takes on 
its maximum is multiplied by a greater value of W(jf) when y 2 is increased, so 
11(1) is monotonically increasing in y 2 . In exactly the same manner, we find 
that III (1) is a monotonically increasing function of x*. 

Because the remaining functions are identical with the functions considered 
in the special case above, we have that 

11(2) = WiCe~* Ni * t+vi) 

111(2) = W 1 Ce-* N{ * t + vt) 

IV(1) - W 2 Ce~* Nv * 

IV(2) - WzCe~* Nx * 

IV(3) - 

Now it is apparent that Ri(E) is never less than 11(1) since 
W(a 2 + f?)G(a, 0) * W(p 2 )G(a, 0) 

(the equality holds only for a = 0) and since a function which is never less than 
a second function cannot have a maximum less than the maximum of the second 
function. Also Ri(E) for the same reason is never less than III(l). Thus R\{E) 
can be the minimum of the four functions Ri(E) at most when R 2 {E) is defined 
by 11(2) and R>(E) is defined by 111(2). 

Since 11(2) and III (2) are the same monotonic decreasing function of (a? 2 + 
y 2 ) and since Ri(E) is a monotonic increasing function of (x 2 + y 2 ), there is a 
value rl of (x 2 + y 2 ) such that Ri{E) < 11(2) when and only when x 2 + y 2 < rl . 
But for all values (x, y) we have that Ri(E) § 11(1) and R\(E) ^ III(l), hence 
for all values within the circle x 2 + y 2 = rl we have that 


(2) 

11(1) ^ Ri(E) < 11(2) ‘ 

and 


(3) 

III(l) g Rt(E) < 111(2) 


so it follows that R 2 (E) is defined by 11(2) and R*(E) is defined by 111(2) within 
the circle. 

We restrict the values of Wi , W 2 , and W* used in the definitions of the weight 
functions to be W\ g W 2 g W* , hence for all values of (x, y) 

WiCe~ Wxt+v%) £ W 2 Ce~* Nv * 

WiCe**<*+*> £ W 2 Ce~ iNx * 

and 

WiCe~* Nixt+va) £ WiCe-* N{x * +v ' ) 

so Rt(E) is at least as great as 11(2) over the whole plane; hence, in light of 
relation (2), Ra{E) is at least as great as Ri(E) for x 2 + y % g rl . Therefore, 
since (2) shows that Ri(E) < R 2 {E) within the circle; (3) shows that Ri(E) < 
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Ri(E) within the circle; and since quite obviously the relations do not hold 
outside the circle, we have that Mi is the set of points 

x 2 + y 2 < rl. 

To determine the region M 2 , we must determine those points outside M\ for 
which Rt(E) < R S (E) and R%(E) < Ra{E). Consider first the part of the plane 
outside Mi for which R 2 (E) is defined by 11(2). This is the region for which 
11(2) > 11(1). Consider the curve in the plane defined by 11(2) = 11(1), that is, 

WiCe - mx > +yi> = II(1) 

We take differentials and have 

-Nix' 1 + tfWtCe-^'^'ixdx + ydy] = 2y[dll(l)/diif)]dy 

but this shows that dy/dx has the opposite sign from y/x since dll(l)/d(y“) is 
always positive. Also note that for x = 0, the equation Ri(E) = 11(2) is identi¬ 
cal with the equation 11(1) = 11(2), so for x = 0, we have 11(1) > 11(2) when 
| y | > r 0 and 11(1) < 11(2) when | y | < r 0 . Furthermore, the curve 11(1) = 
11(2) crosses the x axis at a finite value of x , since for y — 0, 11(1) is a constant 
while 11(2) is a decreasing function of x. 

We will refer to the various regions in the first quadrant of the 0 x , z/)-plane 
shown in Figure I as follows: A is the part of the quadrant which is Mi ; A, B, 
B\ and C are the regions in which R 2 (E) is defined by 11(2), that is, in which 
11(2) > 11(1); and in the same manner, A , B, B f , and C are the regions in which 
Ri(E) is defined by 111(2). 

Since 11(2) and III(2) are identical, we see that within the regions B and B\ 
R 2 (E) = R 8 (E) since in these regions R 2 (E) is defined by 11(2) and R S (E) is 
defined by III (2). We have previously pointed out that 11(2) is never greater 
than Ra(E ), hence it is plear that B and B' should belong to either M 2 or M 3 , 
and we will arbitrarily decide that B is part of M 2 and B' part of M <&. 

Consider then the region (7; here R 2 (E) is defined by 11(2) and R S (E) by 111(1), 
so within C 

11(2) - 111(2) < III(l) - R*(E) 

and again 11(2) ^ Ra(E), so the legion C is part of M 2 . By the same argument 
we have that C f is a part of Mi since within C' 

III (2) = 11(2) < 11(1) = R*(E) 

and 111(2) ^ Ra(E). 

Now consider the remainder of the quadrant outside A, B, B', C, and C'. 
Here R 2 (E) is defined by 11(1) and Ri(E) is defined by III(l). Since 11(1) is 
the same monotone increasing function of y 2 as III(l) is of # 2 , we have 11(1) > 
III(l) for | y | > | x | and 11(1) < III(l) for | x | > | y | . Thus we see that in 
the region under discussion, R 2 (E) is a minimum at most in the regions D itnd 
E and Ri(E) a minimum at most in D' and E*. 
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In order to determine then, that part of D and E which belongs to ikf 2 , we 
seek the region for which 

11(1) < IV(1) when R 4 (E) is defined by IV(1) 

11(1) < IV(2) when R 4 (E) is defined by IV(2) 

11(1) < IV(3) when R 4 (E) is defined by IV(3). 

But within D and E we have that y 2 < x 2 , so it follows that IV(1) > IV(2) so 
Ri{E) is never defined by IV(2) in D or E. Hence we need determine the points 
which satisfy the first and third of these relations. Now it is clear that the 
relation 11(1) < IV(1) is equivalent to the relation | y | < y 0 for some value 
y 0 since 11(1) is monotonically increasing in y 2 and IV(1) is monotonically de¬ 
creasing in y . Let y = y 0 be the line dividing D and E. 

We impose a restriction on W 9 such that D is part of ilf 2 and E is part of M 4 . 
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This restriction is that within E , IV(3) g IV(1); note that since we are con¬ 
cerned only with | y | < | x |, this imposes the greatest restriction on TTY when 
x = y = Vo , so we are requiring that 

W*Cc~* N{v * +,/ o ) g W 2 Ce~* Nv l 


or 


W, £ W 2 e +iIfv l. 

It is simple to see that because of symmetry with respect to both axes and the 
origin, M 2 is defined by x 2 + y 2 > r\ and | y | < | x | and | y | < yo ; M% by 
x l + y 2 > r\ and \x \ < \ y\ and | x | < x 0 ; and Mi by x 2 + y 2 > r\ and | y | > 
2 /o and | x | > x 0 . It should be pointed out that x 0 * y 0 . 

We now consider the general case with a known covariance matrix. Con¬ 
sider the joint normally distributed variates X* , X* , • • • , X* whose covari¬ 
ance matrix is 11 11 (i, j = 1 , 2 , • • • , p), where the <r* *s are all known and 

where || cr* || is positive definite. The mean values of the X*’s are 0i, , • • • , 

which are unknown. It is simple to see that we can consider new variates 
X t = Xi/y/^fi whose mean values are a. = Pi/y/vTi and whose covariance 
matrix is || a a || where an =1. If a sample of N independent observations on 
the X*’s are given, we have immediately the observations on the Xi’s, and we 
denote the sample means of the X<\s by X \, x 2 , • • • , x p , respectively. 

There are 2 P hypotheses among which we wish to choose; as notation, we let 

Ho be = «2 = * * * = ctp — 0 
Hi be 0 , a 2 = as = ■ • • = a p = 0 

II 2 be c *2 5 ^ 0 , = a 3 = • • • = a p = 0 

II 12 be aw 0, a s ~ on = • • • = a p = 0 

etc. As a further abbreviation, let 7 / 1 denote any one of the p hypotheses Hi , 
Hz , • * • , H p ; let H 2 denote any of the ( 2 ) hypotheses Hu , Hu , • • • ; H'* denote 
any of the (?) hypotheses H m , Hm , • v J etc. Also let be the region 

of the sample space for which we accept the hypothesis , and let 

Ri l i r ..i k (0, E) = W(d, H ili2 ... ik )f{E | 0 ) be the risk density function if the hy¬ 
pothesis Hi l i l ...i i is chosen, where w T e have used the notation 0 to represent the 
parameter point ai, a 2 , * * • , a p . 

We will also adopt the following notations: in referring to the parameter point 
(t*i, a 2 , • • • , a p ), we will w T rite ( 4 , ta, • • • , ib) =0 to mean all points for 
which a,-, = a < 2 = • * • = a 1Jk = 0 and (ocy 1 )(a/ 2 ) • • • (ay,) 5 * 0 where i \, i 2 , 
* * * Ik , ji , j 2 , • • • , j» are a permutation of the integers 1 , 2 , • • • , p. Further¬ 
more, we will write [ji , j 2 , • • • , j,] 5 ^ 0 to mean (4 , i 2 , * • • , 4 ) = 0 . 

By Q we denote the covariance matrix of the Xi’s and by L its inverse; we will 
denote the elements of L by X,y. By we denote the matrix obtained by 

striking out rows i \, i 2 , • • • , 4 and columns i \, i 2 , • • ■ , 4 from Q; by L il<t ' tk 
we denote the inverse of the matrix Q <1<2 ”***, and we will write the elements of 
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l*i<r-ik . Thus we can write the joint distribution of the set of 

sample means X \, x *, • • • , x v as 

Concerning the definition of the weight function, we will assume the fallowing: 

I. If Hq is accepted, 

i) the loss is W(22X,ja t a/) if the true parameter point is (a t) 
a 2 , • • • , a p ), where W is a continuous, strictly monotonic increase 
ing function whose value is zero if (1, 2, • • • , p) = 0. The func¬ 
tion is restricted to increase slowly enough that the product of it 
and the density function (4) has a finite maximum with respect to 
the on's 

II. If H 1 is accepted, 

i) consider in particular H a , then for all parameter points except 
(1, 2, • • • , p) = 0, the value of the loss is W(22X“ **,«,-), where W 
is the function defined.above. 

ii) the loss is W\ if the true parameter point is (1, 2, • • • , p) = 0. 

III. If H 2 is accepted, 

i) consider in particular H ab , then for all parameter points except 

(1,2, • • • , p) = 0 and [a] ^ 0 and [b] 0, the loss is W(22X*Ja l aiy), 

where W is the function defined above, 

ii) the loss is W\ if the true parameter point is either [a] 0 or [b] 9 * 0, 

where W\ ^ W \, 

iii) the loss iS'Wj if the true parameter point is (1, 2, * • • , p) = 0 where 
Wl ^ W\ . 

In general; if H* is accepted, 

i) consider in particular Hi t i r .i h , then for all parameter points except. 

( 1 > 2 , • • • , p) = 0 , [ii] 5 ^ 0, {i 2 1 0, • ■ • , [z’i, is] 5^ 0, \i \, is] 0, • • ■ , 

etc., the loss is W(22X5' ,2 '’***a,ay), 

ii) the loss is W k r (r = 1, 2, • • • , k — 1) if [i Jx , ij 2 , • • • , i ir ] 0, where 

ji , j 2 , • * • , j r are r different positive integers less than or equal to k . 

Also WU S Wt -2 ^ ••• g W\, W k k Z\ g Wj_ 2 , W h k Zl £ Wizl g Wl+, 
etc. 

iii) the loss is W k if (1, 2, • • • , p) = 0, where W\ g Wo , 

where the W\ are constants subject to some further slight restrictions which we 
will impose later. The 22 has been used throughout to denote summation over 
all values which i and j take on in 

We consider first the risk density function corresponding to H Q , that is 
Ro(6, E) = W(XX\ i ^ i a i )Ce~ il,xllXiiiXi ~ a<ntl ~ ai \ 

To maximize Rq(0, E), we have the set of p equations obtained by setting the 
p partials of Ro(6, E) with respect to the a, equal to zero, which are necessary 
conditions. We have 

+ [JV2Xy(^ - «,)]ip} Ce~ iNZZXii ^ 
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so the necessary conditions are 

aw 

g + MNZktAxi - <*i)W « o (i - 1, 2, ... , p). 

This can also be written 

(22A^)IW(*) + Tf(2)i\T2Xi^/ - ay) » 0 

where we have set z = 22A,ya,ay and where we use the notation D t to indicate 
differentiation with respect to z. Fix i at two particular values, say a and b\ 
then two of the equations of this system can be written 

(22X aj ay)D f IF(2;) + W{z)NZ\ai{xi - ay) - 0 
(22A 6 yay)ZW(s) + W(z)N2\bj(xj — ay) = 0 

that is 

(2Xayay)[2X 6 y(Xy- ay)] - (2X 6 yay)[2Aay(*i - ay)] 
or 

(2X a yay)(2X 6 yXy) == (2X|,yay)(2X 0 yXy). 

This we can write as 

22X 0 yXi*ay£A = 22Xb*A 0 /afc£y 

or 

22X a yXw(aya:fc — ajfeXy) =0. 

Giving a and b the p 2 combinations of values which are possible, this is a set of 
p 2 linear homogeneous equations in the p 2 unknowns (ayx* — a*x y) which has the 
obvious solution ajXk — a*£y = 0 or ay#* = ajtXy. 

Thus we have that the maximum of the function Ro(6, E) is reached for a set 
of values of the a.-’s which lie on the straight line 

(5) 0 Li = (Xi/x i)ai . 

The function Ro(E), which is the maximum of Ro(0, E) with respect to the 
a*’s is a monotonically increasing function of (22X,y£,£y), which we show in the 
following manner. Because of (5), we see that 

22A t y(a\- - a,)(xy - ay) = 22X,y[x< — (Xi/xi)ai][xj - {xj/x\)oti] 

= XXXaXiXill - (a,/*,)]*. 

Also, 

22X,-ya,ay 585 22X,yX,'iry(ai/xi) • 

Hence we see that Ro(E) is the maximum with respect to « of 
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so for two sample points E' = (z [, x %, • • • , x p ) and E" = {$1', a4\ a?,) 

such that 22X,-/a£r/ = 22\ijx"x", it is clear that Ra(E') = Ro(E "); thus 5o(2£) 
is a function of 2 XXfjXiXj. 

But then without loss of generality, we can consider Uo(#) along the ici axis, 
i.e. for x* = = • • • = x p = 0. Using relation (5), we see that this implies 

that the maximizing parameter values are a 2 = «s = • • • ** a p = 0. But then 

&(£) « max. of WQma^Ce^ 11 ^ 0 * 

«i 

which we have previously shown is a monotonic increasing function of x \. 
Therefore Rq(E) is a monotonic increasing function of 22A tjx&j . 

We will furthermore show that the maximum of each risk density function 
corresponding to parts i) as given in the weight functions are monotonically 
increasing functions of certain quadratic forms in the Xi . Consider for example 
the function corresponding to part i) of Ri(9, E ), that is 

(6) 

We will write the maximum of this function with respect to the at s as Ri{i), 
Note that the weight function is not a function of , hence the partial derivative 
of (6) with respect to a x set equal to zero is equivalent to 

2Ai,<*y - a,) - 0. 

Squaring this relation and multiplying by N/2\ n gives 

(N/2\ n )^\uKii(Xi - «,-)(*/ - a,) - 0 

so we can write the exponent in (6) 

Exp. = -(N/2\n)22(\n\<s - XnXx/)(*< - «*)(*/ ~ «/). 

Because of the definition of X*/, if we write «,•/ for the cofactor of <r</ in | o’,-/1, 
we have 

Exp. = [A/2An(| o,-/ j) ]22(o)iio),/ «i,<*>i/)(x»- *— <*»)(•£/ <*/)• 

But by a well known algebraic identity 2 , 

— wiicoi/ = | o’,-/1 • [cofactor of (<rn<r,-/ — ovor^) in | <r</ |] 

= I <*i! I •«</ 

where we have written «</ to be the cofactor of <r,/ in | «ri/ | , so 
Exp. = ~-(AT/2Xii | o-i/IJSSwi/fe - «*)(*/ — <*/). 

But Xu | o'*/1 = con = | o'!/ j , hence 

Exp. = 22Xi 7 (x v - cti)(Xj - ay). 

Therefore 

Bt(i) = max. of W r (SSX*ya,-ay)Ce -iArzzx ^ <I< ““ ,>( * f_ “’ > . 

all 

1 See M. Bocher, Introduction to Higher Algebra . 
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But then it follows in exactly the same way as with Ro(E) that Ri(i) is a 
monotonically increasing fimction of SZXi,* x&j . For the other functions Rk(i) 
corresponding to other hypotheses H\ the argument is identical, and for risk 
density functions corresponding to hypotheses with more than one cti 0, 
the same argument is repeated two or more times in succession to give the result. 
We will show that for any value of the parameters a \, a 2 , • • • , a p the relation 

jOtiOtj ^ 2£Xiya,ot/ 

holds. This relation is true if the relation 

(7) 22[(w<y/ I <tij |) ~ (<£}/ | <r\j |)]«iafj g 0 

is true where we define u)\; = 0. That is, if 

(1/ I era || <riy |)SSa><jC«>ii — I <Tij \)onaj ^ 0 
where we have substituted «u for its equal | <r\j | . But note that 
<o<y = cofactor of (<rn<r,-y — <rncriy) in | <r,*y | 
hence by the identity quoted (see footnote 2) 

| V*j | <*>*/ = COn (tiij — Ct)uU3ij 

so the left hand member of relation (7) is 

(1/1 9i$ II <r\j [)22(wiywn Wii 0Hj + «if<«?iy)afiay 

= (1/1 orij II a\j |)22wi**>i jOtiOtj 

- fW^/d^H^il) 

4 ! ^ 0 

since all matrices here are symmetric and positive definite. Note that the 
argument can be repeated one or more times to show 

WXSSXtfua,) § TF(22X& <r, ‘ # *a<a i ) 
or 

TF(S2Xj} ir "^ i a^ £ W&2ti) ir ' mik a<ai) 

where iii 2 , • • • ,4 are any set of k different integers less than or equal to p, 
and j\ji • • • , j, are any subset of i\i 2 • , 4 . 

Consider the maximum of the expressions 

Wr Cc~* N1 s x * 1(arf “* ) {x *~ a ’ ]) 

We know that (p — r) of the a**s in these expressions are zero and by an argu¬ 
ment similar to that given above 8 , it is clear that if the r a.’s not equal to zero 
are oiq , a< 2 , • * • , ou r , then the maximum of the expressions is given by 


* See p. 36. 
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Also for r = 0, the maximum is obviously 

W*c e -* NZliXi i x < x t . 

Recall that wo have restricted the Wr’s so that 

(8) Wl £ W\ £ * • ^ WS and £ • ■ • ^ WS . 

From a previous calculation, it follows that 

( 9 ) 2 ZXijXiX; £ 22 \\}ziX ^ £ • • • . 

We can then quite easily calculate the region Mo , that is, the region of the 
sample space for which Rq(E) is the minimum of all the R tl i t ...i k (E) 9 B. We 
have pointed out that 



W^Xijaia,) £ W (SSX<f <s **“•«)) 

so it follows that 


(10) 

R»(E) ^ 

that is 

Ro(E ) £ 

so long as 

ik (E) is defined by R it «,...<*(»)• 


From the relations (8) and (9), we have that 

( 11 ) WlCc*** 2Uimi ** £ wiCe~*”* ZXii9i9 * 
for k — 2, 3, • * • , p. Now because 

W\Ce~ m *' iiXiXi 

is a monotonic deceasing function of 22XfjZ,Xj, and because Ro(E) is a mono- 
tonically increasing function of 2SX<jX f *Xy, there is a value r\ such that within 
the ellipse SSX^<ary = rl , the relation 

(12) Ro(E) < WlCe-* N11 * iiX<Ti 

holds, and outside it the opposite inequality holds. But from relations (10) 
and (12), it follows that within this ellipse, no R ili 2 ... ik (E) except Ro(E) can be 
defined by Ri l i i ...i k (i). Then in view of relation (11) and since a quantity is 
certainly less than the maximum of several quantities if it is less than one of 
those several quantities, the region M 0 is the set of points 22A nz&j < r\ . 

Now consider the functions R a (E) in the region outside Mo . We know that 
R q {E) = R a (i) when 

max. of Tf’fflax* </ a,« i )e" l ^ , '^ l>u ^ ) £ wU"** 2 ™****'* 
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and we will write Ri(E) = Ri(ii) when the opposite inequality holds. Consider 
a part of the sample space outside M 0 in which 

Ri k (E) = «<,(*) 

Ri t (E) « 


&*(#) - Ri h (ii) 

where k £ 1 , and where R,(E) 5 * R&i) for j 9 * n , i,, •. - , z* . We see in this 
case that Ri x (E) — Ri 2 (E) = ••• = Ri k (E) < Rj(E), where again j y* z'i, 
Z 2 , • • * , ik • Furthermore, in this case, because of the relation (11), we have 
that E should be a point of either M %1 , M i2 , • ■ • or, Mi k . We will arbitrarily 
decide in this case that E should be a point of Mi K (s an integer ^ A;) where 
z, is determined so that 

HX'i'jXiXj g ZSX'ijXiXj for any t = . 1 , 2 , • • • , k. 

Now consider the region in which R r (E) = R r (i) for all r = 1 , 2 , • • • , p. 
We see tliat each R r (i) is the same monotonically increasing function of a quad¬ 
ratic form of the type 22 A< jXiXj . Hence in order that. E be a point of a par¬ 
ticular Mr , it is necessary that 

(13) SSA r ijXiXj ^ 22\ijX&j for all a 5 ^ r. 

Now let us consider a fixed r and compare R r {i) with all K ril <r .. <k (E) ’s for k 1 * 
We have pointed out that 

(14) SSA \fdXi £ 

so R r (i) ^ R rilh ...i k (i) and hence R r (i) can be a minimum at most when all 
Rri li 2 ...i k (Eys are defined by other than R r i l i r ..i k (i). 

Consider then, any R r i x (E) when defined by other than i£ rtl (z), that is when 
Rri^E) is equal to one of 

W\Ce ~» H rii (n) (say) 

WlCe*****™ - Rri x (iii) (say) 

= (Hay) 

Because of the relations ( 8 ) and (14), we have that 

Rr h (E) g- 

whenever these are defined by other than and Furthermore 

in the region defined by (13), we see that J? r< 1 (u) § i? r , 1 (m), hence &<, (£) is 
never defined by /? r<I (i«) in this region. 

Now the relation i? r (*) < Rr< l iii) is easily seen to be equivalent to the relation 

(15) 2 2 \<jXiXj < r? 
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for some value n . With the restriction on W\ that it be not so much larger 
than Wi that when (12) does not hold, fJ ril (E) is not defined by we have 

that the region for which R r (i) < is the region defined by (13) and 

(15). 

We then restrict the relationship between the constants W\ and Wt to be 
such that for all points outside of Mo but within the region defined by (13) and 
(15), the relation 22Xj} ,r " , *®<x/ ^ 22X5##,- holds for ji , fa , ••• each 
different from r. Note that this is not an unreasonable restriction since the right 
hand side of the relation is bounded above by r \, ZXXijXiXj is bounded below by 
r\ , and therefore, XX\\) n "' ik x&j is bounded below by some positive value t* 2 
where r 2 is a monotonically increasing function of r \. 

Using a similar method, the region can be obtained after all regions 

.. im for all m < k have been derived. If some further restrictions are 
imposed on the constants in the weight functions similar to those formulated 
in deriving the region M r , it can be shown that the region M tl , 2 ... tjb (fc § 1) 
will be given by the inequalities 

22 \%jXtXj ^ 

22^ 7*m for all m < k and all ji , • • • , j m 

22 Xi} i2 "' ik XiXj £ 22 WiY 2 ^ k XiXi for all ji, ••• , j k 

and 

SSXjr 'S,*,- < rl . 

Thus we have rationalized the following solution of the question posed at the 
beginning of section 4. We test the hypothesis E(xi) = E(x 2 ) *•••=* 
E(x p ) = 0 using the generalized Student ratio replacing the sample covariance 
matrix by the population covariancfe matrix since the latter is assumed to be 
known, at. some chosen level of significance. If the hypothesis is not rejected, 
we make the decision corresponding to H 0 . If the ratio is significant, we com¬ 
pute the ratios T\ T\ • • • , T p where by definition is the generalized 

Student ratio computed for x h , x i2 , • • • , x Jts {ii , h , * * * , 4 , ji , j* , • • • , j* 
is a permutation of the integers 1, 2, • • • , p), the variates z,-, , , • • • , Xi h 

being ignored. 

We consider the smallest of the ratios computed on the basis of (p — 1) of 
the Xi’n ; say it is T r . Then if T r is not significant at some level of significance 
(which need not be the same level as considered before), we make the decision 
corresponding to H r ; if T r is significant, we compute all the ratios based on 
(p — 2) of the x’b. If T ra is the smallest of these, we make the decision cor¬ 
responding to H r « if T ra is not significant but proceed to calculate the ratios based 
on (p — 3) of the Xi& if it Is significant, and so on. 

5* Concluding remarks. It should be pointed out that while the derivation 
of the explicit inequalities defining the various regions of acceptance may be 
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rather involved, for any given sample point E, it is relatively simple to determine 
the region of acceptance to which this point E belongs. That is, we calculate 
the various values and choose the decision if Rj t j r ..j § (E) 

is the minimum of the values of Ri l i r ..t t {E) for all values of ii, i*, • • * , n. 
For making a decision on the basis of a given sample point E, it is not necessary 
to find explicit analytic formulas defining the shapes of the various regions of 
acceptance. 

Since the principle used here is proposed merely as a substitute for Wald's 

principle for the sake of mathematical simplification, it is felt that in certain 

problems Wald's principle may be used as a check on the results. For example, 

it is felt that the new principle is apt to lead to decision regions of the piroper 

shape though the exact sizes of these regions may not be correct. In cases where 

the decision regions cannot be determined by Wald's principle, it seems possible 

that a determination may be made in Wald's sense among the various decision 

regions having the same shapes as those given by the new principle. In the 

case considered here, for example, it may be possible to determine new values of 
2 2 2 
u , n, • • • , r p ~! . 

I should like to express my very great appreciation to Professor H. Hotelling 
for many suggestions during the preparation of this paper and to Professor A. 
Wald for constant guidance. I should also like to credit Professor Helen Walker 
with originally posing the question that led to this research. 
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A TWO-SAMPLE TEST FOR A LINEAR HYPOTHESIS WHOSE POWER 
IS INDEPENDENT OF THE VARIANCE 

By Charles Stein 
Asheville , N. C, 

1. Introduction. In a paper in the Annals of Mathematical Statistics , Dant- 
zig [1] proves that, for a sample 9 f fixed size, there does not exist a test for Stu¬ 
dent’s hypothesis whose/power is ^dependent of the variance. Here, a two- 
sample test with this property will be presented, the size of the second sample 
depending upon the result of the first. The problem of determining confidence 
intervals, of preassigned length and confidence coefficient, for the mean of a 
normal distribution with unknown variance is solved, by the same procedure. 
These considerations including the non-existence of a single-sample test whose 
power is independent of the varianpe, kre extended to the case of a linear hy¬ 
pothesis. In order to make the poVer of a test or the length of a confidence 
interval exactly independent of the variance, it appears necessary to waste a 
small part of the information. Thus, in practical applications, one will not use 
a test with this property, but rather a test which is uniformly more powerful, or 
an interval of the same length, whose confidence coefficient is a function of a, 
but always greater than the desired value, the difference usually being slight, at 
the same time reducing the expected number of observations by a small amount. 

Any two sample procedure, such as that discussed in this paper, can be con¬ 
sidered a special case of sequential analysis developed by Wald [5]. 

The problem of whether these tests and confidence intervals are in any sense 
optimum is unsolved. It is difficult even to formulate a definition of an optimum 
among sequential tests of a hypotheijf against multiple alternatives. However 
it is shown that, if the variance and mitial sample size are sufficiently large, the 
expected number of observations differs only slightly from the number of ob¬ 
servations required for a single-sample test when the variance is known. It also 
seems likely that the confidence intervals do possess some optimum property 
among the class of all two-sample procedures. 

Although Student’s hypothesis is a special case of a linear hypothesis, it is 
treated separately, because it illustrates the basic idea without any complicated 
notation or new distributions. The test for Student’s hypothesis involves the 
use only of Student’s distribution, even for the power of the test, while the power 
function of the test proposed here for a linear hypothesis involves a new type of 
non-central ^-distribution. 

The notation x» is used as a generic symbol for a random variable equal to 
the sum of squares of n independently normally distributed random variables 
with mean 0 and variance 1, i.e., x« has the x distribution with n degrees of 
freedom, 
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Fix'. < T } 


(V2)"r(§») 


r 




du 


f or T > 0 


= 0 


for T £ 0. 


The notation t n is used as a generic symbol for where x is normally dis- 

X. 

tributed with mean 0 and variance 1, independently of x»> i.e.> t H has the dis¬ 
tribution of Student’s t with n degrees of freedom, 


P{tn < 


r»(« + 1)) 

V^irr(J») 


£(**0 


-*(»+i) 


de. 


F m.n is a generic symbol for a random variable of the form F w ,» * nxL/mxX , 
the numerator and denominator being independently distributed, i.e., F m , n has 
the distribution of an F-ratio with m and n degrees of freedom, 


P{F m , n < T\ 


r(K^ + w)) 
r(4m)r(§n) 


jf(0 V ~ 0 + 5' 


dF. 


A symbol of the above type with an additional subscript a denotes the upper 
100a% significance level, e.g., t n , a is defined by 


P{tn > t n ,a } - a. 


The symbol E{x | Q(x) j denotes the set of all x such that the condition Q(x) 
holds. This should not be confused with E(x\ T), which denotes the expected 
value of a random variable x , given the conditions T. 

The size of a critical region is the probability that the sample point will lie 
within the region under the null hypothesis. The terms length and volume, as 
applied to confidence regions are used in the ordinary geometrical sense. 

if* 

2. The test for Student's hypothesis. Suppose Xi , i = 1, 2, * • * are inde¬ 
pendently normally distributed with mean £ and variance a 2 . We wish to test 
the hypothesis £ = £ 0 , the power of the test to depend only upon £ — £o, not 
upon <r 2 . For this purpose we define a statistic t! as follows. A sample of 
observations, X\ • • • x no is taken, and the sample estimate, s 2 , of the variance 
computed by 



Then n is defined by 


( 2 ) 


n — max 



+ 1, no + 1 


where e is a previously specified positive constant, [g] denoting the smallest 
integer less than g. Additional observations, x nQ +i , • • • , x n are taken, and, in 
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accordance with an initially specified rule depending only upon s 2 , real numbers 
di , i « 1 ... n are chosen in such a way that 


(3) 


Efli a 1, «1 = 02 


— fl «0 


'E«J 


2. 


min E a* “ ^ | by (2), 


This is clearly possible since 

(4) 

the minimum being taken subject to the conditions 

n 

2 a< = 1, cti » a% — • • * = a» 0 

i 

Then /' is defined by 

n n 

lai/i - $o E «»(*.' - £) 

f' = — 

(5) 


, I- ft 
“ + Vi ’ 


Vz 


+ 


( — to 
Vs 


where 

( 6 ) 


E a«(*. - £) 


= 


Thei\ u has the distribution of StuSBPr Z with no — 1 degrees of freedom, re¬ 
gardless of the value of a 2 . For (n 0 — l)s 2 /<r 2 has the distribution of x n 0 -i and 

the conditional distribution of ““ £) r u > l£ ven s > is normal with 

mean 0 and variance <r 2 Sa 2 /s = <r 2 /s 2 . Bu^he usual form of a random variable 
Z * 0 _1 is /no-1 = y/ s > V being normally distributed with mean 0 and variance <r 2 , 
and (no — l)s 2 /<r 2 having the distribution of x« 0 -i > independent of y. Thus the 
conditional distribution of u, given s, is normal with mean 0 and variance c- 2 /# 2 , 
so that /„ 0 _i and u have the same distribution. 

This theorem can be used to obtain an unbiased test for the hypothesis H 9 
that £ = £o , the power being independent of or 2 , which is supposed unknown. 
Let a be the desired size of the critical region and let Z„ 0 - 1,*/2 be such that 


( 7 ) 


P{tn f-1 > Znr-l.or/s} 53 g * 
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Then if we reject Ho whenever 

IS 0 *** - |61 

( 8 ) ^ ^no—1,*/2 j 

we obtain an unbiased test of H 0 , whose power function is 1 — /3(f) where 

(9) £(£) ** P|~J»o--J,«/2 + < k 0 -l < UiQ—l,a/2 + 

Tlie fact that the test is unbiased follows immediately from the symmetry and 
unimodality of the t distribution. 

If we wish to test the hypothesis H 0 :£ = f 0 against one-sided alternatives 
f > fo, the procedure is similar. The critical region of size a is defined by 


( 10 ) 

and the power function is 

( 11 ) 


£ - £o 


V* 


^ tn 0 — 1 ,<* 


i - m = . 


A confidence interval for £, of predetermined length l and confidence co¬ 
efficient 1 — a can be obtained by selecting e so that 


, f l l 

1 - a = P{~- 77 = < < 


= p 


( 12 ) 


2\/ e 
l 

l 


2v} 



2Ve. 

>{Xi — £) 

VT~ 


2 VI 


= p^|22°<-«< — {| < 


7^ 




- g < £ < Ifliiti + gj, 


where £ is the true mean of the distribution. Thus (Sa#,- — Z/2, Satfi + Z/2) 
is the desired confidence interval. 

In the above tests and confidence intervals, the distribution of the required 
number of observations, n, is 

s* 

^ Ho “1“ 1 

0 




(13) 
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P{(no - lW < (n« + l)(n* - l)*/*r a }. — P{xV» < 

i r u __ . 

an, 


“ (V2) n,-l r(J(no - 1)) 1 

where y =» (nj — 1) */<r*, 


e -b- u i(»o-*> 


P{n = v) = p|j'<~ + l<i' + l} 


(14) 


= P{(v - l)(no - l)e/v < Xno—i < r(«o 

1 /^( no - l )*/* 1 


D»/S\ 


- l)) L, 


e~ iu n 1<n ° - *> du, 


(\/ 2)" 0 1 r (}(«0 — 1 )) J (r_lj<»„_>),/.« 

for integral r > wo + 1, all other values being impossible. Thus the expected 
number of observations, E(ri), satisfies the inequalities 

1 


(V 2 ) B * _1 r(Kno 

< E{n) 
(15) 


l)) 


{/'(*++jT *■} 




which can be rewritten 
2 


(vr-Ta». - d ) {f <”• + * +1 


(16) 


(n 0 + l)P{x« 0 -i < y] + ~ -Pfxno+i >-2/} 


< E(n) < (n 0 + l)P{x« 0 -i < 2/} + 7P{x» #+ i > y) + P{Xn 0 -i > y\- 

Consequently E(n) is a function of a\ and can be evaluated from tables of the 
incomplete T function. 

As mentioned in the introduction, these tests and confidence intervals will 
not be used exactly in this form, since they waste information in order to make 
the power of the test or the length of the confidence interval strictly independent 
of the variance. Instead of (2) we take a total of 


(17) n 

observations, and define 


max~ j + 1 


(18) 


.no} 


u* + 


8 


Vn + 

Vn. 


Vn 
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By the same reasoning as that following (6), u f has the t distribution with n 0 — 1 
degrees of freedom. By (2) 

(19) n > s 2 /z so that, although ^ - \/n 


is a random variable, 

(20) |f_^iVnl> 

Thus, if we use 


( 21 ) 


J / | ^ t nQ —l,a /2 01 * / ^ l,o 


instead of (8) or (10) respectively, we shall always increase the power of the test. 
Also the expected number of observations will be reduced from that in (16) by 
P\x 2 no-i < I/}- Similarly if z is defined as in (12), the interval 



has length Z, and the probability that it covers the true mean £ is a function of <r, 
but is always greater than I — <*, and differs only slightly from 1 — a if a > 
noz. Thus it can be used instead of the confidence interval (12). 

From (16) it follows that 

lim \E(n) — < 1 

l * J ” 

j^) - ~| > 0, 

the approximation E(n) » a /z being fair provided <x‘ > z// 0 . The length of 
the confidence interval (12) is given by 


/ — O/ ~ _2<7Zno~l,«/2 

l_2WVa ~ vm 

When the variance a is known, the length of the single-sample confidence 
interval of confidence coefficient 1 — a obtained on the basis of n observations 
is given by 


1 r 


i.e., 


l — a — ~7==- I e~ 

\ 2T J— ly/nllv 

l = 2f 00 , a /2(r/\/n . 


dx 


Since, even for moderate values of n Q , say n Q > 30, tn 0 -i,a/2 differs only slightly 
from a /2 > the expected number of observations for a confidence interval of 
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given length and confidence coefficient is only slightly larger than the fixed num* 
her of observations required in the single-sample case when the variance is 
known provided the variance is moderately large. 

3. Distribution of a non-central F-ratio. In the extention of the above 
considerations to the testing of a general linear hypothesis, the power function 
depends on the distribution of a quantity . 


( 22 ) 


F’ = E (q, - c<) 

l 


where qi = — 7 =, Xi l>eing independently normally distributed with mean 0 and 
V t 

variance 1 , and r having the x» distribution, independently of the j*. The 
Ci are real constants. 


Let 


(23) 

(24) 


r-Ec, 

1 1 

= E (x t - C ,Vr ) 2 - (r - Vr |/f: c\j 


Now, (Xi — cdf is a quadratic form of rank m — 1 since the Xi — are 

1 

subject to one linear homogeneous restriction, namely — c,f) = 0. 

i 

m 

Also f 2 is of rank 1, and x* + f = 2 x »‘ 80 by Cochran’s Theorem, \ and 

1 

r are independently distributed as Xm-i and xi respectively. Thus there exist 
jji • • • y m , independently normally distributed with mean 0 and variance 1 
such that 

(25) x 2 = yl +•••• + yi 

. ,2 2 

f = Vi- 

y% 

Let Ui = , Then the joint distribution of u x • • • u m is given by 


^ , • • • , u m t w 


_1 _ 1 

(V^) m (V2)"r(M 


X f e * r r ,( " _2) dr f ^ • •• f ^ dy, ■ • • dy m . 
JO J— oo J-00 


( 26 ) 
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The density function is given by 

d m P{ui < n, 

9n .. • dr m 


___ \ _ f p -\r l(n-2) |m -frSr? j 

~ (VtorT W2) n T{in) l 6 r r 6 1 * 


1 


(27) 


Then let 
(28) 


' (V»)-(VS)T<*n)Jl 

_i + ?4 r -, f 

(v4) m 2 J<m+n) T(|n) Jo * f 
r(i(n + m)) / A A 

(v'Srrfln) V 1 + r r 7 


r « -ir ( i+ i r? ) 

•70 


,i(»+m—2) 


dr 


,-if>i(n+m-2) 
—I (m+n) 


, JL j/* 

- v~r~ \Tr~ Ul ’ 


* x* _i_ 
t = — = + 


+ ui. 


The joint distribution of if and r' 2 is thus, by (27), 

PW < V, r' 2 < r 2 } 

_ r(«m + n)) f [ f ( A A-> ( “+”’ 

“ (v / ir) m r(i n ) J J ' J V + ? 7 dMl ’' ’ 


(29) 


« 1 <* S “?< r2 
2 

r(J(»» +»)) f f f ,, , 2\- 
(V;) m r(|n) J J J (1+Ml) 


■l(»ln)+i(«H) 


«1<7. S»J<rV(l+uJ) 
2 


r(K»» + »)) 
(\4)"r(W 


0+?»■) 


■ J(m+») 


dui tfy 2 • • • di/n, 


//•••/ 


-l(n+l) 


Sv5<r*/(1+«S) 


( i+ ?4 


i (m-f w) 


dui dy 2 ■■■ dy m . 


In order to evaluate this integral, we use the fact that the distribution of a ratio 
of x*-i to x»+i > the two being independent, can be expressed in two forms, by 
(27) and Wilks [2], p. 114, 

P\Xm-vXn+\ < ~ _ imvife j. iri Jr # v (t + t 9 ) 


(30) 


r(*(m - i))r(j(n + 1)) 
r(|(m + n)) 


(Vi)"- l r(K» + D) 


/ - / m —l \ —l(m+n) 

" ■ / V 1 + ? 9 7 dq! dq m , 


s «<<* 
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m 


so that 

PW < V, t'* < r 2 ) 

+ ”)) 


(31) 


V»r(jn)r(K»» - 1 )) 

i* r*/(l+«J) 

x / / 

• r «i<w •'r-o 

r(K>» + »)) 


(1 + M?)-‘ ( " +1 V itm_,) (l + dp dut 


V*rQn)r(i(m - 1)) 

x f f (i + w 2 r i(n+1) f i(m_8) (i + rz~(! + w 2 ) _ ‘ <m-8)_1 dr 

Jm— so •'f-o \ j ~r w / 

- Virit'ir^ 1 - l j) C + “’ + * *■ 

Now we wish to find the distribution of 

r = £ (<. - *) 2 


(32) 


£ (* - c.Vr ) 2 


x 2 + (r - VrVzcO 2 

r r 


= r ' 2 + (,' - Vsd) 2 - 

Carrying out the transformation (32), it is found that the joint density function 
of ij' and F' is 

PW, F’) <W dF’ 


r(j(m + n)) 


:ii hri-») 


(33) 


v^r(in)r(Km - F)) ^ F ' ^ ^ 

X [1 + v' 2 + F' - in' - VW)V (m+n) dv' dF' 

_ r(K»t + n )) r Tjii _ 

V^r(*n)r(*(m-l )) 1 pi 

X [1 + F’ + 2p-\/Scf + 2c 2 ]~ i(m+ " ) dp dF', 

where p = *>' — VSc?. In order to obtain the distribution of F' we must inte¬ 
grate out p over —\/F < p < \ ,r F, obtaining 

P\F' < T\ - 4»„..„(2’, -c 2 ) 
r(|(w + n)) 


(34) v^r(iM)r(i(OT -1)) 
rVT’ 


X ( f " IF' - p 2 ] ,, ”- 5, [l + F' + 2p V^f + 2c?r l(m+ " ) dp dF’. 

Jf’—O Jp—y/T 7 

In the case 2 c»- = 0, (34) reduces to the distribution of the ratio Xm/Xn • 
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4. Test of a linear hypothesis. In this case the power of the test usually 
employed is affected not only by the variance, but also by the values of the pre¬ 
dictors. In order to avoid this difficulty, it will be assumed that only a prede¬ 
termined number of different sets of predictors are used, and that these sets are 
repeated as a whole, as many times as is necessary. This covers, in particular, 
the replication of orthogonal designs for the analysis of variance. 

Let yij , i — 1 •• • m, j = 1 , 2 , • • • be independently normally distributed 
with means 


(35) 


E Vij “ sL, a k x ki , < m, rank (x ki ) = 

Jb —1 


and variance a, the x k % being given in advance, <r 2 and a k unknown. We wish 
* 

to test Ho: ^ Ci k a k = Cm , l = l • • • r < n, where we may suppose equations 

(36) linearly independent, the c lk being given constants. It will be convenient 
to reduce this to a canonical form, as in Tang [3]. First, by a non-singular 
linear transformation 


(37) 

we can make 

m 

(38) 



x ki = 2 bkiZu 
z -1 


en) = the fi X fi identity matrix, 


any two sets of b k i that accomplish, this being related by an orthogonal trans¬ 
formation. Then (35) becomes 


(39) 

and (34) becomes 

(40) 


Eyu = 2 a * ^2 

fc-i y«i 


M / M \ M ^ 

^ ^ . dk b kl ) = ^ Q'k&ki , 

i-1 \fc-i / *-1 


CZO = ^2 Clk Q k = ^2 Clk ^ Om b mk 
fc -1 m -1 


fc -1 

M 


m—1 A--“l 


M 

I 

m—>1 


]£ c I« a m, z — 1 ••• r < M, 


where b are such that 2 b^bkt = $ mi , the Kronecker delta, or, in matrix notation 
(&*•»)*"* = (b km ). Next, the equations (40) can be made into an orthonormal set 


tf 

CIO 


£ C«m <4 


( 41 ) 
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i.e., one in which 


// W . 
2L* c km Clm “ On 


by a non-singular linear transformation on the c\ m . Clearly ScJo* is an invariant 
of (41), i.e., it does not depend upon the choice of a particular transformation 
(37), or of a particular transformation of the c[ m into c" m , since, in both cases, 
all admissible transformations are connected by an orthogonal transformation. 
Then we define 

m 

(43) y'ii - i = 1. -*•,#* 

(44) y'a = 22 d<,y 4 /, * « /» + 1, m 

9-1 


te) ■ 


in such a way that I * 9 ) is an orthogonal matrix which is possible, by (38). 


m m a 

Ey'u = 23 Ey,i « 23 *i« 23 *** a* 


«-l fl-1 fe-l 

M 

22 «* 52 = <*< for t = 1 , ■ • •, Mi 

fc—*1 <r—i 

= 2 d iq Ey,i = 22 d<» £ **« «* 

< Z «=1 qr —*1 

fi m 

= 52 a * 52 di,tkq = 0 for i = M + 1 , • • •, m. 

fc** 1 «7“*J 


Finally we define 
(47) 


// / 
lU) y 


i ~ fJL -f“ 1 • • • , TO 


2 C<m 2 /my, i - 1 , • ■ • , r 


//*? — ^ Gimymj > t r *4" 1, * * * , fly 


where the c im are such that 




an orthogonal matrix. Since the transforma¬ 


tion applied to the y,j to obtain ya is orthogonal, the y ti are independently 
normally distributed with variance o'. Also 

(50) Ey"n - 0, i = m + 1, " • , < 

(51) Ey"j = £ c im ai * c ioN * = 1, • • •, r 


(52) 


Ey’j = 22. Ci»o, 


t =* T + 1, • • * , /4. 
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Since (50), (51), (52) were obtained from the original formulation by a non¬ 
singular linear transformation, the derivation can be reversed, which implies 
the equivalence of (50), (51), (52) to the problem as originally formulated. # 
Thus we can restate the problem in the following manner. Let yi ,, i = 1, 
•••»<, i 1, 2, * • • be independently normally distributed with variance «r 2 
and means 

m 

Eya = 0, i = /* + 1, • • • , t, (i and <r unknown. 

We wish*to test 

(54) // 0 :& = 0,i « 1, ••• ,p < m 

the for z = p + 1 * * * M and a 2 being nuisance parameters. 

Obtain a first sample yij , i = 1, • • • , t, j = 1, • • • , no . Estimate the vari¬ 
ance by 


M b'-l t-l Wo t-l V-l / J 


Let.« be a predetermined constant, and n be defined by 


n = max 


+. 1, Wo + 1! 


After s 2 has been obtained, determine a set of real numbers, ai • • • a» , in accord¬ 
ance with a preassigned rule, so as to satisfy 

Stt/ = 1 

(57) s 2 2 a*- = e 

~ * * * ** u,j 0 . 


± (i 

(58) F ' " 

has the non-central F-distribution given by (34) with n = nrf, — /*, m = p and 

(59) Z) c < " Z) £*/(wo$ — m)s, 

i i 

where £* are the true means, allowing for the possibility that is not true. For, 

(no* — jj)s 2 /v 2 has the distribution of x 2 no *-„, and, after it has been determined, 
* 

Z a jy<s — , i ** 1 ■ ■ • r, are independently normally distributed with mean 0 

and variance <r 2 Sa 2 = <r 2 «/« 2 , so that, given s 2 , a/y^ — {^/\/ 5 > t * 1 • p 
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are independently normally distributed with mean 0 and variance vtyaV But 
the random variables U , in section 3 are of the form Xi/y/r where the a;* are inde¬ 
pendently normally distributed with mean 0 and variance <r 2 , while rfa has the 
distribution independent of the Xi . Thus U can be considered to have 
been obtained by first selecting a stochastic variable r such that r/<r 2 has the 
distribution of x«o<~* an d then selecting U to be independently normally dis¬ 
tributed, given r, with mean 0 and variance <r 2 /r. Since r corresponds with 
(not — fi)8 2 , comparing this with the above, we find that 


3—1 _ 

Vs Vflof “ V 


i = 1 • • • p 


have the same joint distribution as the U . The 


V(M *)* 


are constants, so 


„, £ (§ “*■)’ . j. 

a(fM — m) lVs(no* — m) Vs(flo$ — m)J 


<-i lv*(no* - m) 


has the same distribution (34) as (U — c,) 2 with c, = f*/V(no< — • 

The tests of significance and confidence regions are obtained by a procedure 
completely analogous to that used in the case of Student’s hypothesis. If we 
define k = F p , n ^,a by 

( 62 ) P{Fp t n 9 i—n > Aj} = a, 

then a critical region of size a for testing H 0 is given by 


’ F' > fc. 


Its power function is 


(64) /• 

Similarly, a confidence region for £*, i = 1 • • • p, of confidence coefficient 1 — a 
is given by the set of all f < such that 


not — n 


£ p ) ^ A?, 


a(no< — m) 
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It is evident that this defines the interior of the hypersphere 

(67) - <b omij < k *p 

whose volume is independent of the variance <r 2 . 

The distribution of n, the required number of sets of observations for the 
above tests and confidence intervals is given by 

P{n «7io + l) M P < no + lj 


(68) 

= P{(no< — m)* 8 /® 2 < (n« + l)(not — a)®/® 8 } 


- p|xi < 1,1 ■ WWW) l *■> 

where 


(69) 

2/ = (no + 1 )(not - m)®/® 2 

5 = not — n 

and 


P{n « v) = P ' 



(70) = P{(y — 1 )5a/<r 8 < xi < vte/a 2 ) 

~www)L,„«' u 

for integral v > n 0 + 1, all other values being impossible. 

Thus E(n) satisfies the inequalities 

(V2)‘r(j»(T <*• +1,<r "* + /, 

(71) <E(n) 

< zvWw) {f ( ”° +1)e "‘ v ‘“ 1 du+ l e ~ Wu ^ 1 (ir +l ) du }’ 

which can be rewritten 

(n» + i)P{ x » < y] + ~ P{x»+* > v\ 

< E(n) 


< (n» + i)P{x» < y] + ~ P{x.+* > y\ + P{x» > v }• 


( 72 ) 
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The modifications required to avoid wasting information are exactly analogous 
to those bade in the case of the test for Student's hypothesis. 


6. Non existence of a single-sample test for a linear hypothesis whose power 
is independent of the variance. The canonical form (see Tang [3)) for a linear 
hypothesis in the single sample case can be derived immediately from (63) and 
(64). Let x<, i =» 1 • • • n be independently normally distributed with means 

(73) Exi = Hi , * = 1 • • • p 


Exi = 0, i = p + 1 • • • n 


and variance a. The {,■ and o- 2 are unknown, and we wish to test H t :( { — 0, 
i — 1 ••• p. 

The most powerful test for Ho against a given alternative i = 1 ••• p, 

if the variance a is known, is that based upon the probability ratio (see Neyman 
and Pearson [4]) 


(74) C 


Pi 

P» 


i_ -si{£ s *?} 

r- \« e 1 1 P+1 > 


(V2r<r) 




(Vtor*) 


1 * 


Since any strictly increasing function of pi/po is equivalent for this purpose, 
we can use 


(75) 


<p(x i • * • *p) 


, tiOZi • 


The critical region of size a based upon <p is given by 


(76) 

where 

(77) 


WM - El 


£ hoXi 


-vP* 


>*(> 


V2: 


— r 

' 2 rJ. 




dx 


since, under Ho , 2 i<o£» is normally distributed with mean 0 and variance 

i 

V P P 

Under Hi , 2 f<oX* is normally distributed with mean & and 
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variance a £ f« • Thus the power of the test for the alternative Hi as a func- 
1 

tion of a is 

1 - 0 O (<r) = ( W<,(o) | {,• = , o’ 2 ! 



(78) i & 

Now let us suppose there exists a test based on the critical region W of size a 
whose power 1 — 0 is independent of a. Since TFo(<r) is the best critical region 
of size a for any <r we must have 

(79) 1 - 0 < 1 - 0o(«r) - ^ / Vi *I e^ 1 dx, 
so that 

(80) 1 - 0 < gib. [1 - 0o(<r)] « ^ jf° «"*■* dx = a. 

By interchanging # 0 and H x we can reverse the inequality (80), proving 

(81) 1 - 0 = a. 

Thus any single-sample test for a linear hypothesis whose power is independent 
*of the variance has constant power equal to the size of the critical region. 

REFERENCES 

[1] George B. Dantzig, “On the non-existence of tests of “Student's” hypothesis having 

power functions independent of v,” Annals of Math. Stat., Vol. 11 (1940), p. 186. 

[2] S. S. Wilks, Mathematical Statistics , Princeton, 1943. 

[3] P. C. Tang, “The power function of the analysis of variance tests,” Stat . Res. Mem., 

Vol. 2 (1938). 

[4] Neyman and Pearson, Stat. Res. Mem., Vol. 1 (1936). * 

[5] A. Wald, “Sequential tests of statistical hypotheses,” Annals of Math. Stat., Vol. 16, 

June 1945. 



COMPACT COMPUTATION OF THE INVERSE OF A MATRIX 

By Frederick V. Waugh and Paul S. Dwyer 
War Food Administration and The University, of Michigan 

1. Introduction. Among the most common applications of mathematics to 
practical problems are the solution of simultaneous equations, the evaluation of 
determinants, and the computation of the complete inverse, (or the complete 
adjugate), of a given matrix. Even with modem computing machines these are 
laborious, time-consuming jobs. For that reason there has been great interest 
in recent years in the development of so-called “compact” methods; that is, 
methods that eliminate all unnecessary detail, that use computing machines 
to do as much of the work as possible, and that only require copying the results 
needed in further analysis. 

In 1935 a paper by one of the authors [1] and since then papers by the other 
author [2], [3], [4], [5], [6] and [7] have outlined a variety of compact methods 
and have applied them to actual problems. These papers, together with other 
recent contributi&is, such as those presented in [8], [9] and [10], have resulted 
in much improved and more compact techniques in the general field of the solu¬ 
tion of linear simultaneous equations and allied topics, especially if the matrix 
is axi-symmetric. It is not generally recognized, however, that extension of 
these procedures (usually involving matrix factorization [7] [10]) can be used 
to compute the inverse (and adjugate) directly from the matrix factors without 
the necessity of the reduction of the unit matrix [11; 150] [2; 121] when the 
matrix is non-symmetrie. 

The present paper extends the use of compact methods in three ways. 

(a) It presents a method of computing the inverse (and adjugate) of a sym¬ 
metric or non-symmetric matrix by compact Gaussian methods without the 
formal reduction of an auxiliary identity matrix. 

(b) It introduces the method of multiplication and subtraction with division— 
a modification of the method of multiplication and subtraction—and shows that 
the terms recorded in the compact solution are themselves determinants which 
are minors of the determinant of the matrix. 

(c) It uses the method of multiplication and subtraction with division as a 
compact means of computing the exact value of any minor of the determinant 
of the matrix (whether symmetric or non-symmetric). It further shows how all 
cofactors of order n — 1 (constituting the adjugate) can be computed from a 
compact presentation of the calculations of the determinant of the matrix. 

2. Gaussian methods and notation. Probably the method most generally 
used to solve simultaneous equations is the division method originated by Gauss 
[12]. Variations of this method are known as the Doolittle Method [13], the 
method of pivotal condensation [14], the method of single division [2; 104-112], 
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and the Crout method [8]. The methods as outlined by Gauss and Doolittle 
are applicable only to axi-symmetric matrices (common to least squares theory) 
while a more general presentation, applicable to non-symmetric matrices as well, 
has been made by more recent authors. 

The compact form of this method, extended to apply to the non-symmetric 
matrix, used in this paper is as follows: 

Given the matrix 


( 1 ) 


we compute 


( 2 ) 


where 


(3) 

and in general 



a n 

a 12 

Ol8* 

* * Oin 


«21 

022 

028* 

’ *<hn 

a = (art) 

081 

082 

088* 

1 *08* 


L«.i 

On2 

On8’ 

* Onn J 


On 

Oi2 

Ol8 ’ 

* * O in 

&21 

022.1 

028.1 * 

* *02n.l 

bn 

&32.1 

088.12- 

* ’ Oan.l2 

bm 

b n 2.1 

br*. 12* 

* * bnn>H“-n—] 


b r \ = drl/dll 
dtk.l = ib — 
br 2.1 = Or 2 ~ ^rlGu)/ 022.1 

— Oa* — bssdih — b&.i<hki 

brZ.n — (dr3 — bridis — br2.l023.l)/Os3.12 


(4) 


drk'\2’"j — Q-rik- 12 -*-j—X 


a jk -12- Orj-12- • j-l 

Oyy.i2-y-l 


^rjb* 12 *-*y 


Gr* 12 --j 
OiUfe- 12 - -j 


It should be noted that Grout's presentation [8] is similar to that used here 
except that Crout divides the elements of each row by the leading element while 
we divide the elements of columns. 

The notation used above, introduced by one of the authors [2], parallels that 
used extensively in multiple correlation and regression theory. It differs some¬ 
what from the notation used by Gauss. See [12; 69]. 

Since every b is the ratio of two a's it follows that every b can be written in 
terms of a 1 s so that the formulas can be written in terms of o's alone. This is 
what Gauss did although he used [ ]'s instead of o's. Gauss also used letters to 
indicate the primary subscripts and a single secondary subscript to indicate 
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the number of eliminations. Thus our 0 * 2.1 was written by Gauss as [66, 1] and 
an .12 appeared as [cc f 2], « 

It is in the interest of less extensive notation and it makes our notation some* 
what closer to that introduced by Gauss if we replace 

o r *. 12 .../ by a r k‘(j) 

6by 6|.*.( ; -) . 


This shortened notation can always be used when the secondary subscripts 
include all the integers from 1 to j. In this modified notation the formulas (4) 
become 


(5) 


0r*-(/) = 1) 


6 rk-U) 


<***■(/) 

akh(j) 


a Jk‘U-l) Ory(j-l) 

a jjU- 1 ) 


3. Solution by matrix factorization. The values of matrix (2) are in general 
not final answers to proposed problems but they are values from which final 
answers can be computed. The matrix (2) exhibits essentially both the triangu¬ 
lar matrix of the a rk . {j) which we call t and the triangular matrix b rk .u) which 
we call«. (The diagonal entries of the $ matrix are all unity and do not appear.) 
Hence (2) is really $ — 3 + t. 

A basic property, useful in most problems involving the use of (2), is that 3 
and t are factors of a. Thus 


( 6 ) a = $t and a — = 0 . 

That this is true in the symmetric case was proved in an earlier paper [7; 86 ]. 
That this is also true for the non-symmetric case is now shown in a similar 
manner. 

Let ti be a matrix (n by n) with the first row composed of elements a xk and 
all other elements 0 . Let $1 be a similar matrix with first column elements 6 r i = 

— and all other elements 0. Then a — «iti = ai = (a rk . x ) is a matrix (n by n) 
an 

with all elements of the first column and first row 0 . 

Next let U be a matrix (n by n) with the second row elements i and all 
other elements 0 . Let §2 be a matrix (n by n) with second column elements 
6 , 2.1 and all other elements 0. Then cu — fct* = a 2 = (a r *.«>) is a matrix (n by n) 
with each element of the first two columns and first two rows equal to 0 . 

This process is continued through n successive steps, an additional row and 
column being made identically zero at each step. We have then 

(7) a $iti $st 2 *—•••—- $»t» — a n+ i ~ 0. 

Now consider the triangular matrix 

t — tl *4“ ta 4“ t# *+■ * *• + tn 
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with its rows composed of the non-zero rows of t. Consider also the triangular 
matrix $ = $1 + $a + "•• + $». Then $t = Siti + $*ta + • • v + 8«t» since 
$it,- * 0 for i j; and (7) becomes 

a — St = 0 or a = $t. 


4. Gaussian computation of inverse (and adjugate) without formal reduction 
of auxiliary identity matrix. The inverse of a, a~ l = © = (c*) can be calculated 
directly from the matrices « and t of (2). The adjugate ® = (d r k) can be. calcu¬ 
lated by multiplication by the determinant of the matrix and this can be calcu¬ 
lated by the well known formula 

( 8 ) A = <*H<* 22 lOas( 2 ) ’ • * Gnn (n l) . 

The theory is presented in some detail and illustrated for the case n = 4 after 
which a more general matrix presentation is given. The matrix equation 
a© = 3f is equivalent to the following 4 2 simultaneous equations in the 4 2 un¬ 
knowns (c r *): 


(9) 


On Cl* + flu C2fc + Ui8 Cg* + <*14 C4* 

<*21 Cl* + <*22 C2fc + 028 Cs* + <*24 C4* 

Osi Cl* + 082 C2* + Os8 C3* + 084 C4* 

O41 Cl* + O42 C2* + O48 C3* + O44 C4fc 


A — 1 A -* 2 A -» 3 k~ 4 
10 0 0 
0 10 0 

0 0 10 

0 0 0 1 


Now since ©a = 3 also we have a'©' = 3 and there results another set of 4 2 
equations in the 4 2 unknowns (c rk ) . 


( 10 ) 


On Crl + <*2l C r 2 + <*81 C r 3 + O41 C r 4 

Ol2 Crl 4* <*22 C r 2 + Os2 C r8 + <*42 C r 4 

Oi8 C r i + <*28 Cr2 + <*88 C r 3 + O48 C r 4 

O14 C r i + <*24 Cr2 + 084 C r 3 + O44 C r 4 


r - 1 r*2 r-3 r - 4 
10 0 0 
0 10 0 

0 0 10 

0 0 0 1 


Fisher [11; 160] has shown that the equations (9) could be solved by reducing the 
unit matrix on the rights One of the authors has shown how to calculate the 
inverse of a symmetric matrix by Gaussian methods without reducing the unit 
matrix [1]. We now show how to reduce the non-symmetric matrix similarly. 
By the same process used in getting from matrix (1) to matrix (2), we can reduce 
the 4 2 equations of (9) to the 4 2 auxiliary equations below. 


(ID 


<*n Ci* + <*12 Cs* + Cu Ca* + <*h C 4 * 

<*221 C 2 * + <*281 Ca* + 0*24-1 C 4 * 
<*88-(2) Ca* + 084.(2) C 4 * 

<*44-(I) Cik 


A-1 A-2 A-3 A-4 

- 1 .0 0 0 

= *100 
= **10 
as * * * 1 


The terms marked * can be computed by the process. However if we do not 
compute these terms we have ten equations with the right hand terms either 
1 or 0. 
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m 

In a similar way the 4 2 equations of (10) can be reduced tothe4* auxiliary 
equations below. As above we may neglect the calculation of t3be diagonal 
terms, and of all terms below the diagonal, and still have six equations (with 
terms on the right zero). 

r - 1 r-2 r - 3 r «■ 4 

Crl + &81 Cr2 + &31 Cr8 + &41 C r 4 = * 0 0 0 

(12) Cr2 + &321 CrZ + &421 Cr4 = * * 0 0 

Crl + &48-(2) Cr4 = * * * 0 

C,4= * * * * 

The ten equations of (11) with the six equations of (12) are sufficient for de¬ 
termining the inverse matrix. Solve (11) for k = 4; then solve (12) for r ** 4; 
then solve (11) for k = 3; then solve (12) for r = 3; etc. Each equation can be 
solved completely on the machine to give a value of a c rk . 

It should be noted that Gaussian methods are approximation methods since 
they are division methods. For a discussion and treatment of the errors re¬ 
sulting the reader is referred to papers by Hotelling [9] and Satterthwaite [10] 
to which further reference is made in the next section. 

Different forms for presentation of the results may be used. We suggest 
the following form which presents first the matrix (1), then the terms of the 
matrix (2). The terms of the matrix (£' are then computed by (11) and (12) 
and placed diagonally adjacent to the terms of (2). The transpose of (S is used 
so that the check multiplication by a may be most easily accomplished. The 
result of this multiplication which next appears shows that the computed value 
of a is correct to three places. The final matrix of Table I gives the value of 
the adjugate, T), as found by multiplying each element of the inverse 
by (26)(52.308)(39.356) (43.071) = 2,305,300 (to five places). 

It is possible to check the accuracy of the entries of each row and column 
of the matrix (2) separately by using a check sum to the right of each row and 
at the bottom of each column. We have not taken the space to show check 
sums and they are not particularly needed after one gets a little practice with 
the method. In any case a a -1 should be computed as a final check. 

A more general matrix presentation results from the use of (6). The matrix 
equation a 6 = 3 becomes 8tE = 3 and hence the auxiliary equation becomes 

(13) t6 = 

Now since 8 is triangular with unit diagonal terms and zeros above the diag¬ 
onal, it follows that also has unit diagonal terms with zeros above the. diag¬ 
onal. Hence we can select n ^ n - - equations from the n 2 equation of (13) 

« 

which demand no further knowledge of the entries of i~\ A similar treatment 
of the matrix equation a'E' *= 3> t'8'C' = 3 and 

d4) *'<£' - or 1 

„ / _ 1 \ 

yields —— s -- equations involving zero terms of (t')“\ These two sets of 
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equations taken together in the proper order are sufficient for calculating the 
n 2 values in the inverse. 

It may be of interest to note that this is also a procedure for calculating 
fV 1 when t and 4 are known without the calculation of t _1 and tf " 1 separately 
since 

(16) <£ * 


6. The method of multiplication and subtraction with division. We now 
present a different method, based upon the work of Hermite [15] and Chid [16] 


TABLE I 

Suggested form for calculation 


26 

-10 

16 

32 

19 

46 

-14 

-8 

12 

16 

27 

13 

32 

29 

-36 

28 


26 

-10 


15 


32 



.02873 


-.00696 


.01825 


-.00283 

.73077 

.02436 

52.308 

.01239 

-24.962 

.01440 

-31.385 

-.02267 

-.46164 

-.02302 

.21766 

.01572 

39.366 

.00791 

34.600 

.01991 

1.23077 

- .01519 

.78970 

.00419 

-.85763 

-.02041 

43.071 

.02322 


1.000 


0.000 


0.000 


0.000 


0.000 


1.000 


0.000 


0.000 


0.000 


0.000 


1.000 


0.000 


0.000 


0.000 


0.000 


1.000 


66231 

-16045 

42072 

-6524 

56157 

28563 

33196 

52261 

-53068 

36239 

18235 

45899 

-35018 

9659 

-47051 

53529 


together with important modifications suggested by the work of Dodgson [17]. 
Current presentations of the basic method include the “method of condensation** 
[18; 45-48] and in compact forms, the “method of multiplication and subtrac¬ 
tion” of one of the authors [2; 197-202]. 

In Gaussian methods we divide each element of a column by the leading 
(diagonal) element of that column. In the method of multiplication and 
subtraction we use the leading element as a “pivot** forming a number of two- 
rowed determinants. Thus we use the leading elements as multipliers rather 
than as divisors. No divisions are made in this method. This is a very real 
advantage when the elements of the original matrix contain only two (or three) 
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digits each and when n < 7 (or 5). In such cases we can use this method to 
compute exactly the values of any minor of the determinant of the matrix and 
even the adjugate itself. 

It is perhaps well to mention here that error control is difficult with division 
(Gaussian) methods. Even if many significant places are carried the errors 
may be significant, cumulative, and difficult to measure. The techniques 
suggested by the papers of Hotelling [9] and Satterthwaite [10] are most useful in 
developing error control in matrix calculation. However, where accuracy is 
important, and when the number of digits is not excessive, there appears to be 
merit in calculating the exact values. 

In the method of multiplication and subtraction, we compute from the matrix 
(1) the following matrix 



an 

©12 

Oi3 

• a in 


(hi 

Afi.i 

^231 * * 

* Atn l 

(16) 

a n 

^32-1 

«^83(2) * ’ 

* ^3n-(2) 


La»i 

A, iM 

An 3-(2) * * 

* Ann-(n-l)- 


where 


(17) 


Ark i =* CLllOrk — CLlhflrl 

-drib-(2) ** An-lArk-l ~ A 2 k'lAri.l 


and in general 

Ark.(J) 33 Ajj. ( j-i)Ark.(f-l) — Ajk.(j-.i)A r j.(t-l). 

This notation is similar to that used in connection with Gaussian methods above. 

In the method of multiplication and subtraction with division, we compute 
from the matrix (1) the following matrix: 
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In general the method calls for the calculation of entries according to the 
method of multiplication and subtraction but in addition calls for the division 
by the leading element of the second preceding row or column. Since this 
division must be exact, as is shown in the next section, we have at each stage 
a good numerical check on the work as well as an exact value of the entry. Fur¬ 
thermore it is shown in the next section that the value of 2? r &. ( j) is the exact value 
of the determinant 


On 

Oi2 

Oi8 * 

* aiy 

aub 

021 

022 

028 # 

• 02; 

02* 

Osi 

Oj2 

O38 • 

• as,- 

O3* 

0,1 

a,-2 

a, 8 • 

' ay/ 

djk 

On 

Or2 

Or8 * 

• a r j 

O r * 


All the recorded entries (themselves values of determinants) are calculated on 
the machine. The only limitation is the number of places the machine provides. 
For the trivial problems (composed of small integers) found in most texts of 
College Algebra, one can calculate the values readily without machines. For 
example the determinant 


2 

1 

-3 

4 

2 

1 

-3 

4 

3 

2 

2 

1 . , 

3 

1 

13 

-10 

2 

-1 

1 

3 yields at once 

-2 

0 

-2 

7 

4 

—3 

2 

1 

4 

-10 

73 

-397 


and the value of A is —397. All the other entries are also minors of A. 

Dodson introduced a method of multiplication and subtraction with division 
as early as 1866 [17]. He however used a moving pivot. For our purposes it 
seems preferable to use a fixed pivot as we suggest in this paper. 


6. Proofs of theorems involving the B rh .<j). 

(a) First theorem. We first prove that the numerator C j_i)jB r *.(/_i) — 
Bjk-(i-i)Brj.u-i) in the definition of is exactly divisible by the denominator 
B i t £- 1 . ( 2 ). To do this we expand the terms of this numerator of (20) with 
the continued use of 


( 22 ) 


B, 


B 


rk(j-l) 




- Bj-i, 


(/—2) B r 


Bj-% 


ij—2- C;—3) 


(which is (20) withy replaced by j — 1) and then we multiply and cancel. It 
is found that J5 *-i,^-i.(>- 2> is a factor of all non-cancellable terms so the exact 
divisibility is proved. 

(b) Second theorem . We next prove that B rk . (/> is the value of the determinant, 
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(21). We illustrate first for j ■» 3 and then give a more general proof. When 
j - 3 


an an an an 
an ass an an 
a*i a«2 an 

Ofi Or2 Or8 Of* 


an a 12 an an 

JL 0 #221 5*3.1 Bui 

a?i 0 Bna Ss3*i jBna 

0 Bft. i B r «i J5n i 


1 #221 Bjfel 

~T B®.j Bui B*i 

®11 n n n 

jDrSl -Dr»l <Dr*l 


j B».(2) 
B%t'i Bf 8.(2) 


In the more general case we designate the determinant (21) by | a,* | and reduce 
the order by the “condensation” method just illustrated. It is understood 
that the values of used in the following proof have primary subscripts 
larger than secondary subscripts since the rank of the resulting determinant 
decreases with each condensation 


I a r * | — ,--i | Btk i | = k ,--2 | B r *.( 2 ) 
an -0221 


Bj—l, /—1 • (/—2) 


I Brk (j-l) | = B rk L 


It is to be noted that the first theorem, since each B r *.</> can be interpreted as a 
determinant by the second theorem, is a corollary of a well known theorem 
[19; 33]. In a conventional determinantal notation it might appear as 

(24) AA jkirj = A r *Aj/ — ArjAjk 

where the first subscripts indicate deleted rows and the second subscripts deleted 
columns. 

(c) Third theorem . We next relate the values of B r *.(i) and the values a,*.<» 
and hrfc (i). With the use of the second theorem (23) and (8) we have 

/o*\ Brk(j) _ ana22.ia M (2) ••• ayy-o^i)Or*.(i) __ D 

(25) -- = --- = 

Orfccn Orfc-O) 

and with the additional use of (4) 

/«*5\ Brkij) _ «ii a22ia M .( 2 ) ••• ayy. C/~J) Ork(j) _ d 

(*o) r- =--- = -Bkk’(j) . 

Ork(j) &rk- (j) 

(Ikk-if) 

These formulas may be written in the form 

Brk •<y> * ^iy-(y-i)arfc.<i) 

Brk'(j) = Bkk-(J)brk'(j) 


(27) 
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and since B and B y+i.y+i-t/) are diagonal terms, it follows that the matrix 
(18) can be obtained from the matrix (2) by multiplication by diagonal matrices. 

(d) Fourth Theorem . A fourth theorem gives explicit matrix formulation 
to these results and shows how the values of the matrix (18) can be used in 

factoring the matrix (1). Now (27) and (28) can be written in the form 

(29) £ = 3OM 

(30) 2 = 

where 3 Jl T is the diagonal matrix which multiplies t to get X and fflts is the 
diagonal matrix which multiplies $ to get 2. The values of the X matrix are 
the values of (18) with r ^ k while the values of the 2 matrix are the values of 
(18) with r ^ k. The diagonal matrix SD?r is composed of diagonal elements 
[1, On, B 221 ••• (n- 2 )] while the matrix 9M, is composed of diagonal 

elements [an , #221 , Bu y) • • • #«*.(«-n]. The basic matrix factorization equa¬ 
tion (6) then appears as 

(31) a = 9K7W2S. 

It is to be noted that exact values of elements of all these matrices are avail¬ 
able if the inverse diagonal matrices are written in fractional form, subject of 
course to practical limitations such as number of places of computing machine, 
etc. 


7, Computation of the adjugate matrix. We now present matrix formulas 
which enable one to compute the adjugate of a compactly with the method of 
multiplication and subtraction with division. If (9) is the determinant of a 
and X) is the adjugate of a, we have 

aX 
St® 
tX 

Wlttx 

(32) XX 
and similarly 

a'®' = | a | 3 
t'S'®' = jo |3 
S'®' = I a I (t'r 1 

mWx' - sro. 1 o 1 (t f r l 

(33) ©'2D' - 2». | a | (t')" 1 . 

The computational procedure in getting the adjugate is veiy similar to that 
used in getting the inverse in section 4. X and 2 are triangular matrices while 


= M3 
= M3 
- I«I « _1 

= 9K, | a | « _1 

= m, 1 o 1 
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and r 1 are the matrices used before. The values of 951 ,[1, a u , * • • 

£ n -l,n-4<n-4>] 3W.[au , #22-1 , $83* (2) , ••• $n».<»-l)] &ud |a| BT& first Computed 

by (18) so that 951* | a | and 5K, | a | can be calculated. Without further calcula¬ 


tion we are able to select 


n(n + 1) 


equations from the matrix equation (32) 


having known coefficients on the right — - of which are zero^ and ^ — 

equations from the matrix equation (33) having zero coefficients on the right. 
These constitute the n 2 equations necessary to determine the n 2 values of drk . 
These values of d r k can all lx' calculated directly on the machine and, what is 
more useful in discovering calculational errors, the divisions yielding the d r k 
must be exact . 

For n = 4 these n 8 equations are 


km l km 2 km 3 & « 4 


«n dik + ai2 dtk + ui» dzk + «i4 da 
$22-1 d ^ + B 2 *.1 dzk + $24-1 d ^ 
Bm < 2 ) dzu + B%i- (2) d^ 


(l 11 d r l + 021 dri + (In dr 3 + ^41 d r 4 — 

$ 22-1 dr‘1 + $821 drl + $ 42 - 1 dri = 

$88-(2) d r z + $48 ■ (2) dri = 


0 

0 

$22-1 | a I 


0 

0 

0 

$i8* ( 2 ) | a I 


1 r - 2 


The process is similar to that of section 4. An illustration for the case n ■* 4 
is given in Table II. The matrix of the B *s is directly l>elo\v the matrix a and 
the calculated values of the elements of ®' (obtained by solving (34) and (35)) 
are placed diagonally in the cells with the B } s. The values of the transpose of 
D are used so that the check, premultiplication by a, is easily carried out. The 
next matrix in Table II exhibits a® = | a | Q. The last matrix of Table II 
is a five decimal place approximation to (S' which is obtained by dividing the 
entries of $' by | a | . Since we know these are the correct five decimal place 
values of we may compare the corresponding values of Table I to see how 
much those are in error. It should be noticed that the approximation to S' may 
l>e readily carried to more than five decimal places if desired. 

As with the Gaussian methods, it is possible here, also, to check each row 
and column individually by using check sums. 

The work necessary for the computation of the ad jugate from the matrix of 
the B ’s can be shortened somewhat by the use of the fact that the adjugate is 
composed of the cofactors of the a r k . Now the cofactors of the four terms in 
the lower right hand corner are = $ n -i,«-i.<«- 2 ) ; d*-i.« * -$«-i.«.(«-a); 

d n ,n -1 = —$»,«-! (n~ 2 ) ; and d nn = $*«•(*- 2 ) and these are available from the 
calculation of the $’s though $ nn . { *_ 2) is not recorded. (See the lower right 
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four entries of the IPs and a’s in Table II above). With these four values 
immediately available, the use of but n 2 — 4 additional equations is demanded, 
or this additional information can be used in cheeking. 

TABLE II 


Suggested form for computation of adjugate (with check) and then inverse 


26 


-10 


15 


32 


19 


45 


-14 


-8 


— 12 


16 


27 


13 


32 


29 


-35 


28 


26 

66233 

-10 

-16033 

15 

42069 

32 

-6503 

19 

56151 

1360 

28558 

-649 

33194 

-816 

-52258 

-12 

-53068 

296 

36236 

53524 

18224 

47056 

45899 

32 

-35013 

1074 

9659 

-45899 

-47056 

2305327 

53524 


2305327 


0 


0 


0 


0 


2305327 


0 


0 


0 


0 


2305327 


0 


0 


0 


0 


2305327 


.02873 


-.00695 


.01825 


-.00282 


.02436 


.01239 


.01440 


-.02267 


-.02302 


.01572 


.00791 


.01991 


- .01519 


.00419 


-.02041 


.02322 
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MULTIPLE MATCHING AND RUNS BY THE SYMBOLIC METHOD 

Irving Kaplansky and John Riordan 
New York City 

1. Introduction. The two subjects in the title have generally been treated by 
distinct methods, an excellent summary of which is given by S. S. Wilks in 
Chapter X of [13]. For two-deck matching, an appreciable simplification over 
the classical work of MacMahon [7], which seems to underlie the generating 
function used by Wilks [12] and Battin [2], has been shown by one of us [5] 
to follow from symbolic methods. Here we give an elaboration of these methods 
to multiple matching and to runs. 

The basis of the symbolic method in both problems has been given in [6], 
but for completeness a skeleton resume is given in Section 2 below. A new 
point is stressed: the relation of coefficients in polynomials of the symbolic 
method to factorial moments (cf. FrSchet [4]). 

The emphasis for the most part is on showing the expedition of the symbolic 
method in reaching known results, but in several instances new results are 
obtained. 

2. Symbolic expressions and moments. Let A \, • • • , A n be arbitrary events 
and let p(Ai x , • • • , Ai k ) denote the joint probability of Ai x , • • • , Ai h ; let 
P r be the probability that exactly r of the events occur. Then 

(1) Pr - t, (-l)\CrZ(-l) k p(A tl) ••• , A*) 

fe -0 

and in particular 

P. = i2(-l)‘p(A<„ ••• ,A <t ), 

b—Q 

or symbolically 

(2) P 0 - [I - p(Ai)][l - p(A,)] ... [1 - p(A n )]. 

The cases to be studied will be exclusively ones where so-called quasi-symmetry 
holds, i.e., p(A tl , ... , Ai h ) is either 0 or a function fa of k alone. In that 
event (2) can be evaluated as follows: suppress all products that vanish, and 
form a polynomial f(E ) by replacing each surviving term p(A%) by E. Then 
Po = f(E)fa where E is a displacement operator: E k fa = fa . 

The same polynomial f(E) can also be used to obtain P r and the moments of 
the distribution. From (1) we see that P r * f(E)fa , where fa = (— l) r jbC^jt. 
Again it is well known (Fr&het [4]) that the k -th factorial moment, defined by 

1)P<, 

i-0 


272 
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is also given by 

Af(jt) * k\Xp(Ai x , • * • , Ai h ). 

It follows that the terms of f(E)$ 0 are essentially the factorial moments. More 
precisely, if 

m - £ Sk(—E) k , 

*-o 

then 

(3) M(k) ** hlSi^k . 

3. Card matching. To avoid complications which add nothing to the funda¬ 
mental idea, the case of three decks will be considered explicitly. As remarked 
by Battin [2], there is no loss of generality in supposing that the three decks 
have the same number of cards: let them be numbered from 1 to n. Let 
denote the probability that the i-th, j-th, and fc-th cards of the three decks are 
matched, that is, all occur in say the J-th place. The condition of quasi-sym¬ 
metry is fulfilled, the (symbolic) product of k of the p’s being either 0 or 4* * 
[(n — k)\/n\] 2 . 

The simplest problem is to find the probability that there be no triple matches 
of the form (i, t, i). Since no products of the expression 

(1 - Pm)(l ~ Pm) ••• (1 ~ p«««) 

vanish, the answer is (1 — E) n <fo , in agreement with Anderson [1] (cf. also 
problem E 589 in the American Mathematical Monthly , p. 512,1943; solution 
by John Riordan, p. 287, 1944). 

Suppose now that the decks are given compositions in the usual fashion by 
having ai , &i, c\ aces respectively, a*, 5j, cj deuces, etc. We may number the 
cards so that 1, • • • , a x are aces, a x + 1, • • • , a x + a* are deuces, and similarly 
in the other decks. The probability of precisely r matches among cards of the 
same denomination is then given by 

(4) F(oi, b x , ci)F(o*, 1 Ct) • • • fa > 

where 

F(a, 5, c) = n(l - p ijh ) 

the symbolic product being taken over ranges i = 1, • • • , a, j = 1, • • • , b, 
k « 1, • • • , c. 

A simple combinatorial argument reveals that 

(5) F(o, 5, c) - MaMbUcU-EY/tl 

where (a)< * a(a — 1) • • • (a — f + 1) is the Jordan factorial notation. The 
problem of matching arbitrary decks is thus compactly solved by (4) and (5). 
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4. Examples. When decks of explicit structure are in question, the com¬ 
putation of probabilities and moments reduces to straightforward algebra, as is 
illustrated in the three following examples. 

1. Suppose each of three decks has two suits of two cards each. Then, since 

#(2, 2, 2) 2 = (1 — 8 E + 4# 2 ) 2 -1-16# + 72 E 2 - 64E* + 16# 4 , . 
it follows that 

( 4 !) 2 Po = (4!) 2 - 16(3!) 2 + 72(2!) 2 - 64(H) 2 + 16(0!) 2 
- 576 - 576 + 288 - 64 + 16 - 240, 
and the calculation of (4! ) 2 P r may be set forth as follows: 
r 

0 576 - 576 + 288 - 64 + 16 = 240 

1 576 - 576 + 192 - 64 = 128 

2 288 - 192 + 96 = 192 

3 64 - 64 = 0 

4 16-16 

each column being obtained by multiplying its first row entry by a binomial 
coefficient. These results may be verified readily by direct enumeration. 

2. In the case of three 5 by 5 decks, the polynomial is 

#(5, 5, 5) 6 = (1 - 125# + 4000# 2 - 36000#* 

+ 72000# 4 - 14400# 5 ) 6 
= 1 - 625# + 176,250# 2 - 29,711,250#* 

+ 3,346,063,12s# 4 ♦ • • 

The factorial moments can be obtained using (3). 

M w = 625/2S 2 = 1, 

M (2) = 2 176250/25 2 -24 2 = 47/48, 

M (8 ) = 7923/8464, 
ilf< 4 ) - 1784567/2048288, 

the first two in agreement with Battin [2]. 

3. The symbolic method can be applied to more intricate kinds of matching, 
as this final example shows. Suppose that the six matches represented by 
(123) and its permutations are forbidden, likewise the six matches represented 
by permutations of (456), and so on in groups of three. Then 

(1 — PuiX'l Pm)(l “ PawX 1 — P23i)(l — Pau)(l — Pm) 

- 1 - 6 # + 6 #* - 2 #*, 
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and so the answer is 

(1 - 6F + 6E 2 - 2E z ) n *\ 

The analogous problem for 4 decks has the solution 

(1 - 24# + 108F® - 96 E % + 24F 4 )*' 4 . 

The generalisation to an arbitrary number of decks involves the enumeration 
of Latin rectangles, in itself a formidable problem. 

5. Moment formulas* It is possible to deduce from (4) and (6) fairly explicit 
formulas for the factorial moments. Let us define u (0 ** (a) t (b) i(c) t . Then 

(5) may be written symbolically as 

F(a, b 9 c) - S|W <,) (—JS)V<1 * exp (- uE ).' 

Writing F(a t -, , a) = exp (—w t F), we then have 

Po - exp [-(wi + + • • *)^]0o 

== S«(wi + Wa + * • ■)* ~ 

or finally, if m + 1 decks are being matched, 

(6) Po = i + U 2 + • • •)'/« WT- 

It is to be borne in mind that after expansion of {u x + u% + • • •) 1 by the multi¬ 
nomial theorem, the term u\u\u\ • • • is replaced by u[ x) ui v) ui x) ••• with the 
u ’s defined as above. 

By (3), factorial moments corresponding to (6) are given by 

(7) M( t ) = (wi + + - • *) VW?• 

Thus in particular 

n m ilf(D == Wi + W 2 *' * = 2i&ibi • • • 

n m (n — l) m J/( 2 ) = (tti + i/a + • • -) 2 

= Zidiidi - l)b<(6< - 1) • • • + 2Z if i i a i a j b i b i • • • 

the cases m = 1, 2 in agreement with Battin [2]. 

In the simple case where m * 1 (two decks), a, = 6< = a and n = sa, we have 
u (t) = (a)J and 

(8) (fO^co = (w + w-f ••• 2/)* 

with sw’s in the parenthesis. The right of (8) is the multi-variable polynomial 
of E. T. Bell [3], Y t (yi , y %, • • • , y t ) with y k ■» (s)u (k) and (a) a symbolic factorial 
such that y(y$ = (s)au w) u (i> , etc. Instances of (8) may be compared with 
Olds [9]. 
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Expanding (8) we obtain 

- (bUu w V + AWmuV] w + * * * 

- (s)ta 2t + AWMa JH (fl - l) 2 + • • • 

and, since (8) t /(n) t —► aT* as ft —> qo , it follows that Af<o —► a\ i.e., the limiting 
distribution is Poisson with mean a. As indicated in [6] one may proceed, to 
obtain successive terms of an asymptotic series for the distribution. These 
results generalize to the case where M(y = approaches a finite limit as 

n —* qo. In certain instances where M ( d —► qo, asymptotic normality can be 
proved (cf. [1] and [8]). 

6. Successions and runs. As shown in [6], enumeration of permutations with 
a specified number of 2-successions like 12, 42, • • • may be accomplished by 
introduction of symbols like qn , q*z , denoting probabilities that 1 immediately 
precede 2, 4 precede 2, resp. For permutations of objects Ox of which are of one 
kind, Os of a second, • • * with ai + a* + • • • a. = n, the probability of exactly 
r 2-successions is ([6] p. 914) 

(9) P r - G(a 1 )G(c h ) • • • GiaMo 
with th = ( —1 )JC r (n — k)\/n\ and 

o(o) -£(.).(• 

t —0 

It is to be noted that in deriving (9), elements of the first kind are numbered 
1 to ai, of the second a\ + 1 to «i + <h , • * • and a succession occurs if either 
i precedes j or j precedes i with i and j in the same set. 

For s = 2, i.e., two kinds of elements, there is a simpler formula due to Stevens 

[10] , but for the general case (9) seems to be the only reasonably explicit solution 
known. In particular, for the function F(a i, • • • , a t ) of Mood [8] which enu¬ 
merates the numt>er of permutations with no 2-successions, we have 

F(oi, • • • , a B ) = ft lfj(ai) • • • G(a,)<fH>. 

Factorial moments for 2-successions are given at once by (7): 

(10) Af(<) = (fti + ft2 + * * * + ft*) V(ft) t 

with Ui 3) = (a,) j((ii — I),-. 

It is more usual to classify permutations according to the number of runs, 
say r', a run consisting of a succession of i like elements {i = 1, 2, • • • )• Since 
every 2-succession causes the loss of a potential run, we have r f = ft — r, i.e. the 
number of runs is n diminished by the number of 2-successions. Factorial 
moments M(» for runs are then given by the usual formula for change of origin: 

(11) M ( o = E (—l*«C 4 (n - 

i-0 

Examples, 1. Introducing a , for the t-th elementary symmetric function 
of the a’s, 
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«i - a t + a% + • * • a, * n, 
as = aio* + o>\dz + • • • + a«-ia« f 
as = aiOsOa + • • < , 

we may derive from (10) and (11) the formula 

(12) fit( i) * 1 + 2a%/n 

for the mean number of runs. The variance <r 2 , the same for runs and 2-suoceB- 
sions, is given by 

(13) <r s = M m + M m - M\d = 2aa(2 ^ ~ ” ) ~ Gna \ 

For runs of two kinds of elements, formulas (12) and (13) specialize to those 
given by Wald and Wolfowitz [11]. 

2. For runs of elements of a single kind, factors in (9) pertaining to other ele¬ 
ments are suppressed. Thus if a is written for a \, and terms in a* , • • • , a, are 
suppressed, (9) and (10) become 

p r - am o, 

M (t ) = ( d) t (a - l)*/(n) r . 

Moments for runs are given by 

M(t) = 2 ("“!)* tCi(n — t) t -<Af(o = (a) t (n — a -f- 1 

t-0 

in agreement with Mood [8]. 
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ON THE POWER FUNCTIONS OF THE 2J 2 -TEST AND THE T 2 -TBST 

By P. L. Hsu 

National University of Peking 


1. The general linear hypothesis. Every linear hypothesis about a p-variate 
normal population or several such populations having common variances and 
covariances is reducible to the following canonical form [4]: The sample distri¬ 
bution, when nothing whatever has been discarded from the whole sample, being 

&*)-**** | ajy ,»<•+»> exp {-* ± a (j t 

(1) (Vir - ViMyir - Vir) - J .2 «<y 2 D dy dz 

(n > p), 

where the Vir and the an are unknown, the hypothesis to be tested is 

H: vir = 0 (i = 1, • • • , p; r » 1, • • • , ni , < m). 

It is clear that the y ir (i = 1, * • • , p; r = rii+1, • • • , m) can have no use. 
Also, the only useful quantities supplied by the set z» are the statistics 

n 

*-l 

because the remaining quantities may be regarded as a set of angles which are 
independent of yw and the bn and which has a known distribution free from any 
unknown parameter in (1), [2], After discarding the irrelevant y *s and the angles 
there results the reduced sample distribution 

K | a,-,' I i(,,,+B) i h, exp{-* t cc a 


■2 (y<r - y<r)(yjr - Vir) - § 2 II dy db 


Hereafter the indices i, j and r shall have the following ranges: 
i, j = 1, “ * > P> r = 1, • • • , n t , 

and the convention that repetition of an index indicates summation will be 
adopted. Writing 

dij = y%ryjr f Cij = dij + bij , 
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In the re mai n i ng two sections of this paper we deal exclusively with the 
special cases p = 1 and n\ = 1. According asp = lorni*=lwe shall drop 
the indices i and j or the index r. 

The case p =* 1. When p = 1, (2) reduces to 

i?Q! i<Wl+n) (C - exp (—+ otyrHr — dcU dy. 

Putting y r = c*x r we obtain 

(3) j R:a l(ni+n V <ni+w) "' 1 ( 1 - x^^exp (-*ac + acW - *aij r i? r ) dcJSdx. 

The hypothesis H is now 

H': vr = 0 (r » 1, • • • , ni). 

If it? is any critical region for the rejection of H\ denote by w(c) the cross 
section of w for every fixed c. Then the power function of w is 

Pw(v, a) = Pu>(m , 

(4) = Ka Uni+n) e~ i °" lr ’’ T fc —dcf (1 - x r s r ) jB - , e“‘ x ' , "n dx. 

•to •'tc(c) 

It is known [3] that, in order to have 

(5) 0 W (O, a) = € 
for all a, it is necessary and sufficient that 

(6) f (1 ~ ErSfO^^IItfo = Ac, 

•»1i>(c) 

where A is a constant. 

The P 2 -test is the test based on the critical region 

wo : XrX r = c~*y r y r = E 2 > const. 

The author has proved [3] that of all the critical regions which satisfy (5) and 
whose power function is a function of arj r Tj r alone, the region w 0 is the uniformly 
most powerful one. This result is generalized by Wald [7], who proved that, of 
all the regions satisfying (5), the surface integral 

X) = / Ml, <*) dA 

is maximum when w is too. The author gives here another proof of Wald’s 
theorem which is easier as it dispenses with the somewhat intricate Lemma 1 
of Wald. Prom (4) we have 

T„(«, X) - £ c i<ni+»)~l g-W d( , 

• / (1 — XrXr)*”” 1 !! dx I exp (^iocnrljr + OtC*X r Vr) dA. 

Jw(0) * 
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By means of a rotation in the space of , • • • , rj %l ) we can obtain 

J r exp (— iatjrVr + ac*Xrijr ) dA 

frtr-fc 

= f exp (-Jafrfr + ac'iXrZr)*?!) dA - 32 o*a“(cx r x r )*, 
•'frfr-x SS 


where a* depend* only on a, k and X. Hence 

(7) ?„(<*, X) - r c ‘ ( " 1+ " , “ , e“ ,ae dc f (XrX r ) k (l - XrXr^U dx, 

fr-0 •'O ^UJ(C) 

where 6* depends only on fc, a and X. Since w(c) satisfies (6), it follows from a 
lemma of Neyman and Pearson [5] that 

f ( X r X r )*(l ~ X r X r ) in ~ l n (fa 

Jw(c) 


is maximum, for all c and k , when w(c) is the region x r x r > const., i.e. when w 
is itself the region XrX r > const. This proves Wald’s theorem. 

Still another optimum property of the E 2 -test may be established on using 
the volume integral instead of the surface integral. This is stated in the follow¬ 
ing theorem. 

Theorem 1. Let S be any linear set and let 

<p w (a, S)= /3 w (v, ct) II drj. 

Of aU the regions satisfying (5), the region w 0 has the maximum *>*,(<*, S). 

For, by the same computation which leads to (7), we easily obtain 

*>«(«> S) = r C i(ni+n) - 1 e- iac dc [ (x r Xr) k (l - x r z r )‘" _1 n dx, 

fc-0 JO J*o(e) 

where c* depends only on k t a and S. Hence the result follows. 

This theorem also contains my previous result as a consequence. For, writing 

Pw(v, ot) = fiaijrVr), Pu> 0 (v, oi) = foiarirVr), 

we have 

0 < / ^ (/o(aijri?r) ~ f(onj r rjr))U drj = Jf ^ Wl_1 (/o(aO - /(<*0) <#• 

Since S is arbitrary, we must have f{cd) < fo(od). 

The case n\ = 1, When ni = 1, (2) and H become respectively 

X | |* (B+1) | c 4i - y<yi |‘ <n_ '~ 1 ’ 

exp (—iotifiij + otiffliHi — dy dc, 

H"x Vi - 0 P). 
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There is a unique real matrix 


t\x 

tn tn 

J >\ 9 Up •• • tpp J 


(tu > 0; zeros above the principal diagonal) 


( 10 ) 


such that [Cij\ * TT'[2], Introducing the new variables xi , ■ • • , x p by means 
of the transformation 

(0) foi, * • • , Vp] = fo, • • • , x p ]T 

with the Jacobian | T | * | c*y |* we obtain the distribution 
f(x f c)H dx dc « K | aij | l(n+l) | ca |* Cn ~ p, (i - 

•exp (—J aifiij + aijt k #kTh — dx dc 

(h = 1, • • • , p; tu ** 0 when k > i). 

If tr is any region, we write 

ftc(l h <*) * Pwill , • • • , Vp > «J1, <*12 , ■ • • , <*pp) = c ) n 

so that a) is the power function if w serves as a critical region for rejecting 
H We have, symbolically, 

w = D X w(c ), 

where D is the set of points (c<>) for which [c*y] is positive definite and w(c) is 
the cross section of w for fixed c<y. Then 

/3.(,, a) = K\ aiJ |“" +M e^ 1 ^' [ \ c (J U dc 

Jd 

■ f (1 - x<x { ) Hn - p - 1) e’ , « , “ Ir ' i ndx. 
Jw(«) 

It is known [6] that, in order to have 

(11) fi.(0, a) - € 

for all an , it is necessary and sufficient that 

f (1 — x<x<) i< "“ p *" l) n dx « Bt, 

Ju(e) 


( 12 ) 


where B 


>[ (1 - ZiZt) 


i(n-p-l) 


n dx. 


The T*-test is the test based on the critical region 

too: ZiXt =* c^ya/j = T'/il + T 2 ) > const., or T* > const.. 
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where c ij is the general element of [c.yp 1 and T 2 is, except for a constant factor, 
Hotelling’s generalization of “Student’s” ratio. 

In order to establish an optimum property of T 2 analogous to that of E 2 given 
in Theorem 1, we define, for any linear set S and any region R in the sample 
space, 


MS) = / 


&b(v, a)II dij da . 


*b(S) does not necessarily have a finite value, and it is this fact which renders 
the following theorem less satisfactory than Theorem I. 

Theorem 2. Let p p be the smallest latent root of [Cij] and let E be any subset 
of D in which p p is at least equal to a fixed positive constant . Of all the critical 
regions w which satisfy (11), the region w 0 has the maximum 
In order to prove this theorem we need the following two lemmas. 

Lemma 1. If c is a positive constant , the integral 


has a finite value. 

Proof. Let pi , • • • , p p be the latent roots of [c<y] in the descending order 
of magnitude. From a known theorem [1] we get 


-n 


e<Pp<-'<P i<« 


Hence I is finite. 
Lemma 2. 


(#>.••• P P r {p+i) n (pi - P;)n dp 


- c jf ■ jf (S p ~ w+,> ) dp ' • • ■ dpp ■ 


(13) +MS] = z g k [ |c,. y \- (p+i) n dc [ (l - z i x s ) i(n ~ p ~ l) (xix t ) k n dx 

fc —0 Jb Jw(c) 

and tu,*(S) is finite f where Qk depends only on k and S. 

Proof. Let A be the set of points (a*,-) for which [a*J is positive definite. 
By (8), we have 

1MS) = K [ \c iS - y iVi H dydcf 1 i 1(n+1) e~ ic «^JU da, 

V wB "A 


exp ( — botijrji rjj + dr}. 


There is a real non-singular matrix G = \gtj] such that [an] = GG'. Using the 
transformation 

[m > * ‘ = [fi, • • • , £p], 
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whose Jacobian is | G p 1 * | an we have 

J ** la^r 1 f exp (—+ QjitiVifiLdt* 


(14) 


This is reducible by means of a rotation to 

J = | a if I -1 f exp (-in r« + (a {i y< y,)'r\)U. dr 

"TiTitS 

80 

553 | a,/ r l £ dk(aijV*yi) k > 

ft—o 

where 

** m£-£*•'*”*' - - to 

and d* depends only on k and & Hence 

[ I l*‘” +1) p“ ic ‘ ,a< ' J n da = it d *I»> 

J A fc— 0 


(2t)** 


where 

( 15 ) 

Now 

where 


It ■= J \an\ in (fltiV‘Vi)* e ~ ia ‘ .n*,. 


/. -J,/«> 


<-0 


y( f) = f | ay |»-e-*‘*‘<-“ w " > “«nd« = Ki I e« - 2ty<yj 

*A 

= Xilc„|- ,(B - H ’ +1) (l -2te«y 4 tfiV 


—K»+j»+i) 

,0- ,.,,V-K*+*+i> 


Hence 

(16) 

where 


i* = e k j cn 


\-H n + p+1) (c i ’y(yj) k , 


K.*r(i±f±i+*) 


^ = 
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Hence 

- K g £ a I Ct 7 r‘ ( " +P+1) I c„ - y<y,|‘ ( -'- u (e ,/ y i ^nd» dc 

- E ff* [ I Co- P**® n dc f (i - *< x<) ,( "- p-I> (*< * t )* n <te, 

fc-o Jm Ju>( c) 

where 0 * = Ktdye k depends only on k and S. 

Now 

f (1 — 3 < a;<) i<n ~ P “ 1) (*<*<)* n dx < f ndx, 

J w(e) 1 

f I Cii r <rH> Tide < f \c<, r 4(p+1) n dc 

*a 

is finite by Lemma 1 . Hence 

\ 

to oo r 

$wa(S) < const. 23 d*e* = const. 53 - 

Jfc—o *-o 

and so <Pu>m(S) is finite. This proves Lemma 2. 

Proof of Theorem 2 . Since yfr xB t(S) is expressible as (13) and is always finite, 
it follows from (12) and the Neyman-Pearson lamina that \p wK (S) is maximum 
when to is i0o . This proves Theorem 2 . 

Simaika [ 6 ] proved that of all the critical regions w which satisfy the conditions 

(a) ft,( 0 , a) = € for all ctij , 

(b) ft„(ij, a) = f(cnjwj) t 

Wo is the uniformly most powerful one. Strangely enough, this result cannot 
be deduced as a consequence from our Theorem 2 . 

The difficulty in dealing with the integral $ W (S) is that it Is not always finite. 
In order to have a finite integral let us consider the following: 

r„( 0 , S) = [ a) n dr, da, 

where [$ij\ is a positive definite matrix. As an immediate consequence of 
Simaika’s theorem we have 

(17) r w ($, s) < T wo (e, sy . 

for any region w satisfying (a) and (b). Now the question arises whether 
(17) remains true if the condition (b) on w is removed. The following theorem 
answers this question in the negative. 

Theorem 3. Let [0*/] be a 'positive definite matrix, [ pij ] = [c,-> + 0^] _1 and 
hi , * * • , X p be the roots of the equation | c,y — \B fJ | = 0. There is a function 
g =* 0 (Xi, • • • , \ p ) such that the region 

Wi : PiMiVj > g(hi , • • • , X p ) 
satisfies (o) and has the maximum T w (8, S). 


(*±i±i+») 

<<•!>* 
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Proof. From (10) and (14) we obtain 
T a (6, S) = K'E dk f | Cn - y ( yj n dy dc 

0 J V> 

■ / |««|*“ da. 

Comparing the inner integral with (15) and using (16) we get 

T v (e,S) =tok I I *, + 8 (i r K " +?+1) I Cl ,. - Vi y } ^-’-"(p^ytfndydc 

Jfc—0 •'to 

(18) =11 Ok 1 1 Cy + e iS r ,ln+p+1) | C(, | ,(n - p> n dc 

fc-0 "D 

■ f (i - 

*'u>(c) 

where jijXiXj is the result of applying the transformation (9) on pijytyj . We 
shall show that, for every fixed set of c t y, a unique number gr = gr(Xi, • • • , X*) 
exists such that the region p iy y# y = ytjXiXj > g satisfies (12), i.e. 

(19) [ (1 - x i x i ) i(n - p - 1) ndx = Be. 

Since [y*,] = T'ld,- + 0,y]~ 1 2’, the latent roots of [y,-,-] are X,/(l + X.) (i = 1, 
• • • , p). Hence by a rotation the equation (19) is reduced to 

(20) f (1 - i < ^,•) 1(n ~’’“ 1, n di = B f . 

As g increases from 0 onwards, the left member of (20) decreases steadily from 
B to 0. Hence there is a unique g = g (\ x , • ■ • , \ p ) which satisfies (20). 

For this g(\ i, • • • , \ p ) the region vh satisfies (a). Hence, applying the 
Neyman-Pearson Lemma on (18) we obtain the result. 

From Theorem 3 we learn that there actually exist other exact tests for H" 
which have some optimum property not possessed by T 2 , viz., the tests based 
on the critical regions Wi corresponding to various values of the 0,y. However, 
the great difficulty in numerical computation prohibits their application and the 
T 2 -test stands out as the only test which is both simple and good. 
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SOME GENERALIZATIONS OF THE THEORY OF CUMULATIVE SUMS 
OF RANDOM VARIABLES 


By Abraham Wald 
Columbia University 

1. Introduction. In a previous paper [1] the author dealt with the following 
problem: Let {z*} (i = 1, 2, • • • , ad inf,) be a sequence of independently dis¬ 
tributed random variables each having the same distribution. Let a be a given 
positive constant, b a given negative constant and denote by n the smallest 
positive integer for which either 

(1) Zi + • • • + z n > a 

or 

(2) Zi + • • • + z n < b 

holds. The main problems treated in [1] were: (1) Derivation of the probability 
that the cumulative sum reaches the boundary a before the boundary b is reached; 

(2) Derivation of the characteristic function and the distribution function of n. 
In this paper we shall consider the following more general problem: Let K = 

{&*(zi, * * * , z»)} (i = 1, 2, • • • , ad inf.) be a given sequence of functions and let 
n be the smallest positive integer for which either 

(3) k n (zi ,•••,*»)> 1 
or 

( 4 ) k n (z %, • • • , #») ^ 1 

holds. No restrictions are imposed on the sequence K except that it must be 
such that the probability that n < °o is equal to one. The purpose of this 
paper is to derive some theorems concerning the probability that /c n (zi , • • • , z n ) 
> 1 and concerning the expected value of n. Obviously, the problem formulated 
here is a generalization of that considered in [1], since the latter can be obtained 

2 0, I Jj 

by putting fa(xi ,•••,*<)- I- Z (*!+•••+ *<) - -- r. 

a — o a — o 

2. The conjugate distribution of z. Let z be a random variable whose dis¬ 
tribution is equal to the common distribution of z,-. In this section we shall 
introduce the notion of the conjugate distribution of z which will be used later. 
According to Lemma 2 in [1], under some weak restrictions on the distribution 
of z there exists exactly one real value ho ^ 0 such that 

(5) E(e*') = 1 

where E(u ) denotes the expected value of u for any random variable u. 
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For simplicity we shall assume that z has a continuous distribution admitting 
a probability density everywhere, or that z has a discrete distribution. By the 
probability distribution f(z) of z we shall mean the probability density of z, if 
the distribution of z is continuous. In the discrete case f(z) will denote the 
probability that the random variable takes the value z. From (5) it follows that 

( 6 ) /*(«) - e‘ k >f(z) 

is a probability distribution. We shall call f*(z) the conjugate distribution of z. 
For any random variable u we shall denote by E*(u) the expected value of u 
under the assumption that the distribution of z is given by /*(z). The expected 
values E(u) and E*(u) may depend on the sequence K = {, ••• , zi)} 
(i = 1, 2, • • • , ad inf.). Occasionally we shall put this dependence in evidence 
by writing E(u | K) and E*(u | K), respectively. 

3. Two theorems. In this section we shall derive two theorems. The first 
theorem is concerned with the probability that /c»(zi, • • • , z n ) > 1 and the 
second theorem with the expected value of n. In what follows the operator E x 
will mean conditional expected value under the restriction that k n (zi , • • • , z n ) 
> 1 and Ei will mean conditional expected value under the restriction that k n 
(zi , • • • , z n ) < —1. If the distribution of z is given by /*(z), these conditional 
expected values will be denoted by the operators E* and E * , respectively. 

Theorem 1. Let K — {fc»(zi, • • • , 2 ,)) be a sequence such that the 'probability 
that n < & is equal to one under both distributions f(z) and f*(z). Let 7 denote 
the probability that k n (zi , • • ■ , z n ) > 1 when f(z) is the distribution of z , and let 
y* denote the probability of the same event when f*(z) is the distribution of z. Then 


(7) 

E^e^lK) 

_ 7 * . 

2?s(e*"* 0 |iT) 

_ 1 -y* 

7 

1-7 

and 





(8) 

Et{e- Znh °\K) 

_ y_. 

7 * ' 

Et(e~ Znl, °\ K) 

11 

' l 


where Z n = Zi + * * * + z n . 
Proof: From ( 6 ) it follows that 


(9) 

_/*(* 1 ) ' 

6 m ■ 

• • rw 
••/(«.) 

and 



(10) 

—Zn^o __ f(.Z l) 

• • /(*») 


A set («i 9 " " , z n ) will be said to be of type 1 if and only if —1 < k m (zi , • • • , 
z m ) < 1 for m = 1 , • • • , n - 1 and k n (zi , * * ■ , z n ) > 1 . Similarly a set (zi , 
• • • Zn) will be said to be of type 2 if and only if - 1 < fc m (z 1 , *.. , z m ) < 1 for 
m = 1 , • • • , n — 1 and k n (zi , ■ • • , z n ) < — 1 . 
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We shall prove Theorem 1 under the assumption that the distribution of * is 
discrete. Because of (9) we have 


( 11 ) Ei(e z ’ h, \K) 


Si 




'/*to 


/»(«■) 

/(^») 



£ rto •••/*(*») 
(* 1 »* ••*«!»)______ 

£ 7to ■••/(*-) 

(*i, •••.*») 


where the summation is to be taken over all sets {z \, * ♦ • , z„) of type 1 . But 

y* 

the last expression is obviously equal to — and, therefore, the first equation in 

7 

(7) is proved. The second equation in (7) follows in the same manner if we take 
into account the fact that the probability that n < qo is equal to one. Similarly, 
equation ( 8 ) can be obtained from (10). The proof can easily be extended to 
the case when the distribution of z is continuous. Hence, Theorem 1 is proved. 
Theorem 2. If Ez s* 0, the relation 


( 12 ) 


E(n | K) 


E(Z n \K) 

Ez 


holds for any sequence K = [ki(zi ,***,£,)} for which one of the following two 
conditions is fulfilled: 

(a) There exists an integer N such that the probability that ti < N is equal to one . 

(b) E(n | K) < oo and the first four moments of z are finite .. 

Proof: First we shall show that condition (a) implies the validity of (12). 
For any integer i we shall denote Zi + • • • + Zi by Z*. Since the probability 
that n < N is equal to 1, we have 

(13) E{Z n | K) + E(z n + 1 +•••+**)= EZ n - NEz. 


Since the conditional expected value of (z „ 4 1 + ••• + *») for a given value of 
n is equal to (N — ri)Ez, we have 

(14) (z n+ i +•■.+**)« E(N - n | K)Ez = NEz - E(n j K)Ez. 


Equation ( 12 ) follows from (13) and (14). 

Now we shall show that condition (b) implies (12). Denote by P N the prob¬ 
ability that n < N. Let the operator E N denote conditional expected value 
under the restriction that n < N, and let the operator denote conditional 
expected value under the restriction that n > N. Then we have 

(15) PMZ N ) + (1 - I\)E'M = E(Z N ) « NEz. 


Since 


= Eff(Z n | K ) + E N (z n +l + * * * + Zn | K) 

(16) EM • - E n {Z n I K ) + E m (N - n | K)Ez 
* E N (Zn i K) + NEz - E„(n | K)Ez, 
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we obtain from (15) 

(17) PAEs&n | K) + NEz - E N (n | K)Ez\ + (1 - P n )E' k ( Zn) = NEz. 
From E(n | K) < *> it follows that 

( 18 ) lim (1 - Pn)N = 0. 

Now we shall show that (18) implies the validity of 

(19) Hm (1 - Pn)E'„( Z*) - 0. 


liOt T N = Zn 
( 20 ) 


NEz. Because of (18), (19) is proved if we can show r that 
lim (1 - Ps)E'n{Tn) - 0. 

JV-oo 


Denote by R N the set of all points (z x , • • • , z N ) for which n > N. Then the 
probability measure of R s is equal to 1 — P N and 


( 21 ) 


(1 - Pn)En(Tn) = f TnM ) • • • f(z N ) d Zl • • dz N . 
Jr., 


Let R]f be the part of R N in which T N < —JV, R% the part of R N in which T N > N 
and the part of Rn in which —N < T N < N. Because of (18) we have 


(22) lim f T N f(zi) • • • f(z N ) dzi • * • dz* < lim (1 - Pn)N = 0 . 
Jr*. 


Denote the cumulative distribution function of T N by F N (T N ). Clearly, 

(23) f 2 Tn/(zi) • • • f(zs) dzi • ■ • dz N < f Tn dF N (TN ) < ^ f Tn dP^T 7 *). 

Jr s Jn is° Jit 

Tn 

Since the first four moments of z are finite, the 4-th moment of converges 
to 3<r 4 where a is the standard deviation of z. Hence 

(24) lim ±TUF«(T n ) = 3 <r 4 . 

JV—« J~<X> IS* 


From (23) and (24) it follows that 

(25) lim f T y f(zi) • • • f{z N ) dz x • • • dz* = 0. 

N^oo Jr k 

Similarly we can prove that 

(26) lim f, Tn/(zi) • • • f(z N ) dz x • • • dz N — 0 . 

N-ooJr„ 

Equation (20) follows from (21), (22), (25) and (26). Hence (19) is proved. 
From (17), (18) and (19) we obtain 

\]mPN{EN(Z n \K) - E N (n \K)Ez) = 0. 


(27) 
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Since Ez * 0, lim P N * 1, lim E N {n | K) - £(» | X) and lim JS^Z* I X) **■ 
#(Z» | K), equation (12) follows from (27). Hence condition (b) implies (12) 
and Theorem 2 is proved. 

4. Lower limit of £(n | K ). In this section we shall derive a lower limit for 
E(n | K). First we shall prove the following lemma. 

Lemma 1. For any random variable u we have 

(28) e*<“> < Ee u . 

Proof: Inequality (28) can be >vritten as 

(29) 1 < Ee u ' 

where u f = u — Eu. Lemma 1 is proved if we show that (29) holds for any 
random variable u ' whose mean is zero. Expanding c u ' in a Taylor series 
around u f = 0, we obtain 

/* 

e u> = 1 + u' + ^ e* iu,) where 0 < £ (u f ) < u f . 

Hence 

Ee u ' * .1 + \Eu f V {u,) > 1 
and J/jmma 1 is proved. 

Now we are able to prove the following theorem. 

Theorem 3. Let K ~ {Ki(z\ , • • • , z,)} he a sequence of functions mch that 
the probability that n < is one under both distributions f(z) and f*(z) of z. Let 
y be the probability that K n (zi , • • • , z n ) > 1 when f(z) is the distribution of z, 
and let 7 * be the probability of the same event when f*(z) is the distribution of 2 . 
Then 

(30) E(n I K) > gE [7 log y ~ + (1 - 7 ) log ] 
and 

(31) **(»!*) > ^.(Vlog t + (1 - T*) log ^i’], 

provided that Ez and Ez * are not equal to zero. 

Proof: First we shall prove Theorem 3 in the^case when there exists an integer 
N such that the probability that n < N is one. According to Theorem 2 we have 

(32) E(n I K) = E(Z * ] g K) « Yt [yEl(Z * I ■ K) + (1 - I *)!• 

From Lemma 1 and Theorem 1 it follows that 

ho Ei(Z n I K) < log ^ and ho E t (Z n \ K) < log . 


(33) 
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From (32) and (33) we obtain 

hEzE(n\K) - khEi(Z n \K) 

(34) .y* 1 _ ,y* 

+ (1 - y)E t (Zn\K)\ <ylog?- + (1 - 7) log 

y l — y 

Inequality (30) follows from (34) if we can show that KE{z) < 0. From Ee h ° M » 
1 and Lemma 1 it follows that h^E{z) < 0. Since h Q ^ 0 and E{z) ^ 0, we must 
have hoE(z) < 0. Hence (30) is proved. To prove (31) we proceed as follows: 
From Theorem 2 we obtain 

(35) -h*Ez*E*(n [ K) = -h 0 [y*E?(Z n \ K) + (1 - y*)El(Z K \ K)]. 
From Lemma 1 and Theorem 1 it follows that 

-Hy*E*(Z n | K) + (1 - y*)E* (Z„ | K)] 

(36) i _ v 

< 7* log ^ + (1 - 7*) log 

From (35) and (36) we obtain 

(37) hoE*(z)E*(n\K) > 7 * log ^ + (1 - 7 *) log • 

y 1—7 

Since E*e~ ho!C = 1 it follows from Lemma 1 that —hoE*z < 0. Inequality (31) 
follows from this and (37). Hence Theorem 3 is proved in the special case when 
there exists an integer N such that the probability that n < N is equal to one. 

To prove Theorem 3 in the general case, for any integer N let the sequence 
K n = (fc»jv(zi, * • • , Zi)) be defined as follows: k iN {zi ,•••,*<) = fc,(zi ,•••,*<) 
for i < N and k xN {z \, • * • , z,) = 1 for i > N. Denote by y s and 7 * the values 
of 7 and 7 *, respectively, if the sequence K is replaced by K N • Then we have 

(38) E(n | K) > E(n \ K„) > 1"7* log ^ + (1 - 7 *) log 

hoEzL 7jv 1 — 7jvJ 

and 

(39) E*(n | K) > E*(n \ K„) > [ 7 * log ^ + (1 - 7 *) log 

Since lim 7 ^ — 7 and lim 7 * = 7 *, inequalities (30) and (31) follow from (38) 

tf-00 N-tc 

and (39). Hence the proof of Theorem 3 is completed. 

6. Remarks added in proof. The results obtained in the present paper have 
obvious applications to sequential analysis. These applications are, however, 
not mentioned here, because at the time the present paper was submitted for 
publication, sequential analysis constituted classified material. In the mean¬ 
time, the material on sequential analysis has been released and was published in 
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this Journal, June, 1945. The results obtained in the present paper are more 
general than those'obtained in connection with sequential analysis. Theorem 3, 
in the present paper, implied the efficiency of the sequential probability ratio 
test discussed in Section 4.7 of the paper on sequential tests. 
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ON THE DESIGN OF EXPERIMENTS FOR WEIGHING AND MAKING 
OTHER TYPES OF MEASUREMENTS 

By K. Kishen 

Department of Agriculture , Lucknow , India 

1. Introduction. In a recent paper, Hotelling [1] has discussed the basic 
principles of the theory of the design of efficient experiments for estimating the 
true unknown weights of p given objects by means of a specified number N of 
weighings, p < N in case the scale is free from bias and p < N — 1 if it has a 
bias the unknown value of which has to be estimated from the same data. He 
has emphasized the importance of these designs in other kinds of measurements 
besides weighing of objects and has called attention to the need for further 
mathematical research for obtaining a “comprehensive general solution.” Such 
a solution has now been obtained in case the number of weighings N is at our 
choice. Some other general designs have also been given iri this paper for 
specified values of N and p. 

2. Estimation of unknown weights and efficiency of a design. Using 
Hotelling’s notation, we may write 

(1) E{y a ) =!>,•„ hi 

»-l 

where i = 1, 2, • • • p, on the assumption that there is either zero bias in the 
scale or the bias is known a priori , and a = 1, 2, • • • A 7 . E(y a ) is the expecta¬ 
tion of the oth weighing. For a biassed scale, we may take i = 0, 1, 2, • • ■ p. 
The efficient estimate of each of the hi s has been derived by Hotelling by the 
method of least squares. It is of interest to obtain these estimates by the use 
of the theory of linear estimation as developed by Bose [2] and Rao [3]. 

Assuming that J/i, 3 / 2 , • • * Vn are N stochastic variates forming a multi¬ 
variate normal system with the variance and covariance matrix given by 

(2) u = [uni 

it follows from Rao’s generalization of Markoff’s theorem that the best unbiassed 
estimates of the 6,’s are given by the solutions of the normal equations 

(3) X'ir l XB' = X'U~ l Y\ 

where B = [bjh •■•&*>] and Y = [yiy 2 • • • y N ], and B' and Y' denote as usual 
the transpose of the row vectors B and Y, i.e. column vectors. 

In the present case, the assumption is that all the N stochastic variates are 
uncorrelated and have a common variance <r 2 , so that 

(4) U- 1 = 1/. 
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Hence the normal equations in (3) reduce to 

(5) X'XB' = XT', 

which are exactly the same as the normal equations given by Hotelling, since' 

(6) X'X = [a h ] 


where a,,- = S(xi a x ja ) 

Let C = [dj] denote the reciprocal of the matrix X'X, so that V{bi) = c» t c 2 
and cov (6,-6,•) = c^<r 2 . Then the mean variance of the p unknowns for a design 
is given by 


(7) 


jl N J Ca 

Vm N p~~~ 


If the main object of the experiment is to estimate the unknowns with the 
least variance, the most efficient design (for a specified value of N) would be 
the one for which the minimum minimorum of <r 2 /N is attained for all the p 

p 

unknowns so that the mean variance in this case is a*/N. The factor, N ^ c«/p, 

*-i 

on the right-hand side of (7), therefore, measures the increase in variance result¬ 
ing from the adoption of any design other than the most efficient design. Its 


V 


reciprocal, 


N'Zcu 


may appropriately be defined as the efficiency of a given 


design for providing estimates of the p unknowns. This quantity will now be 
utilized for judging the relative precision of the general designs discussed in the 
subsequent paragraphs. 


3. Design for N = 2 W , p < 2 m (zero bias) or p < T — 1 (non-zero bias). 

By utilizing the properties of a 2-sided ra-fold completely orthogonalized Hyper- 
Graeco-Latin hyper-culw of the first order introduced by the author [4], it is 
easy to see that for N = 2 W ', p < 2 m (when there is zero bias) or p < 2™ — 1 
(when there is bias), m t>eing any positive integer, a completely orthogonalized 
design can be constructed with each unknown weight estimated with the mini- 
mus^variance <r 2 /St. As remarked by Hotelling in the case of N = 4, p = 4 
(for zero bias) or p = 3 (if there is bias), the matrix X'X for this design is a 
scalar matrix of order p X p if there is zero bias, or of order (p + 1) X (p + 1) 
if there is bias, each of the diagonal elements being N. The reciprocal matrix 
is also a scalar matrix in which each of the diagonal elements is 1/A r so that the 
estimates of all the unknowns are mutually orthogonal. 

As a particular case of this general design, we may take N => 16, p * 16 (for 
zero bias) or p * 15 (if there is bias), the completely orthogonalized design for 
which is represented by the matrix 
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(8) X = 


1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 
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-1 

-1 

1 

-1 

-1 

-1 

-1 

l 

1 

1 

1 

1 

1 

-1 

-1 

-1 

-1 

1 


for which X'X is a scalar matrix of order 16 X 10, each diagonal element being 
16. Again, a completely orthogonalized design for N — 16, p < 16 (for zero 
bias) or p < 15 (if there is bias) is represented by a matrix X obtained from the 
matrix in (8) by omitting any 16 — p of its columns if there is zero bias, or 
16 — p — 1 of its columns if there is bias. In the matrix X, permutation of 
rows and columns is permissible and each such matrix represents a completely 
orthogonalized design. 

For the design given by Hotelling 1 for N = 4, p = 3 (zero bias), the efficiency 
is 35 per cent. The completely orthogonalized design for which the efficiency 
is 100 per cent is represented by the matrix 

1 1) 

1 -1 
1 1 

1 -1, 

4. First design for N = 2 m + 1, p < 2 m (zero bias) or p < 2 m — 1 (non-zero 
bias). For N = 2 m + 1, p < 2 m (zero bias) or p < 2 m — 1 (if there is bias), 
m being any positive integer, probably the most efficient design available seems 
to be that represented by the matrix X obtained from the corresponding matrix 

1 The allusions here and at the end of the next section are to designs on p. 305 of the 
Hotelling paper [1], a passage concerned with designs subject to the restriction that the 
entries on the matrix be 0’s and -+T’s only, as is necessary in many types of measurement. 
The more efficient designs given above, whose matrices involve — l’s also, can be used only 
in such cases as that of weighing in a balance, where the objects under investigation can be 
put, some in one pan and some in the other. Such situations are considered in a different 
part of Hotelling’s paper. 




(9) 


.Y = 
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X for the general design of Section 3 above by adding a row 1,1, 
The matrix X'X for this design then comes out as 


m 

1 to it. 


( 10 ) 


X'X 


N 1 1 

1 N. 1 
1 1 N 


1 ... iV 

which is a symmetrical matrix of order p X p if there is zero bias, or of order 
(p + 1) X (p + 1) if there is bias. The variance of each unknown for this 
design is 


(ID 


or 


N — p 1 
iV+p-2 


for zero bias, 


( 12 ) 


N - 


N - j- p — 1 


Thus the efficiency of this design is 


(13) 


or 


(14) 


1 - 


p - 1 


N(N + p - 2) 


1 - 


V 


if there is bias. 


for zero bias, 


if there is bias. 


N(N + p - 1) 

The loss of efficiency resulting from the adoption of this design is, therefore, 
V ~ 1 _r_v.._ V_ 


for zero bias or' 


if there is bias. 


nr/nr ■ a\ ivi vjavuo xja %t/\t i •% \ ** 

N(N + p - 2) N(N + p — 1) , 

As a particular case of this, for A' = 5, p = 2 (zero bias), probably the most 
efficient design available is specified by 


(15) 


X 


1 

1 

1 

-1 

-1 


5<r 


The variance of each unknown in this case is — and the efficiency of the design 

is 96 per cent. For the design given by Hotelling for this case, the variance of 
4cr 2 

each unknown is — and the efficiency is 35 per cent. It would thus appear 
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that, as judged by the criterion of efficiency as defined here, the design repre¬ 
sented by the matrix in (15) is more efficient than Hotelling’s design. 

5. Second design for N = T + 1, p < 2 m (zero bias) or p < 2 m - 1 (non¬ 
zero Was). Another interesting design for these values of N and p is that 
represented by the matrix A" obtained by adding a row 1, 0, • • • 0 to the cor¬ 
responding matrix X for the general design in Section 3 above. The matrix X'X 
for this design is then the diagonal matrix 




'N 

0 

... 0 

(16) 

X'X = 

0 

N - 1 

... 0 



0 

0 

N — 


of order p X p (for zero bias) or (p + 1) X (p + 1) (for non-zero bias). As 
the reciprocal of this matrix is also a diagonal matrix, the estimates of all the 
unknowns are mutually orthogonal. The efficiency of this design is 


(17) 

(N - l)p 

Np — 1 

for zero bias, 

or 



(18) 

N - 1 

N 

for non-zero bias. 


By comparing the efficiency of the first design given in (13) and (14) with that 
of the second design in (17) and (18) respectively, it would appear that the 
efficiency of the first design is always higher than that of the second design for 
non-zero bias, and is also higher in the case of zero bias for p > 1, but equal for 
V = 1. 

6. First design for N = 2 W + r, p < 2 m (for zero bias) or p < T — 1 (for 
non-zero bias). For N = 2 m + r, p < 2 m (for zero bias) or p < 2 m — 1 (for 
non-zero bias), m being any positive integer and r any positive integer < 2 W \ 
a highly efficient design is represented by the matrix X obtained from the 
corresponding matrix X for the general design in Section 3 above by adding r 
rows 1, 1, • • • 1 to it. The matrix X'X for these designs then comes out as 

N r r • • • r 
r N r ••• r 

(19) X'X ^ r r N ••• r 

N 
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which is of order p X p for zero bias, or of order (p + 1) X (p + 4) for non-zero 
bias. The variance of each unknown determined by this experiment is 


o* _ 

(p - Dr 8 
N + (p - 2)r 


for nro bias, 


N + (p — l)r 


Hence the efficiency of this design is 


(P ~ Dr* 
N[N + (p - 2)r] 


if there is bias. 


for sero bias, 


N[N+(p- l)r] 


if there is bias. 


The loss of efficiency as a result of adopting this design is, therefore, 

(p — l)r 2 pr 2 

A7rA7 - ; , - i for zero bias, or A ; - - -— if there is bias. 

N[N + (p — 2 )r] N[N + (p — l)r] 

7. Second design for N = 2 m + r, p < 2 m (for zero bias) or p < 2 W — 1 (for 
non-zero bias). Another design for these values of N and p is that represented 
by the matrix X obtained from the corresponding matrix X for the general 
design in Section 3 above by adding to it r rows 1, 0, 0, * • • 0. The matrix X'X 
for this design is then given by 

AT 0 0 • • • Q 

0 N - r 0 •• • 0 

(24) X'X = 0 0 N — r *• • • 0 


which is of order p X p if there is zero bias, or of order (p + 1) X (p + 1) if 
there is bias. Here also the estimates of all the unknowns are mutually orthog¬ 
onal. The efficiency of the design comes out to be 


C N - r)p 
Np — r 


if there is zero bias, 


( 26 ) 


N - r 
AT 


if there is bias. 
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By comparing the efficiency of the first design of this type given in (22) and 
(23) with that of the present design given in (26) and (26) respectively, it would 
appear that in case of zero bias, the efficiency of the first design is higher than 
that of the second design for p > 1, but equal for p = 1; and in case of non¬ 
zero bias, the efficiency of the first design is always higher than that of the 
second. 

8. Comprehensive general design when N is at our choice. When N is at 
our choice, we can always obtain a completely orthogonalized design by taking 
N equal to a sufficiently large power of 2. For p = 2 m , m being any positive 
integer, a completely orthogonalized design for N = 2 m , when there is zero 
bias, has been given in Section 3 above. If, however, there is a bias, a com¬ 
pletely orthogonalized design can be constructed for N = 2 W+1 . When p = 
2 m + ii, where u is a positive integer < 2 m , a completely orthogonalized design 
is available for N = 2 m+1 , whether the bias is zero or not. 

For N = 2 W+1 , this is the most efficient design, with 100 per cent efficiency, 
but as N is given higher powers of 2 than 2 m+1 , the variance of the estimate of 
each unknown decreases. When N = 2*, where l > m + 1, the variance of 

each unknown is of that for N = 2 m+1 . 
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NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


NOTE ON THE LAW OF LARGE NUMBERS AND “FAIR” GAMES 

By W. Feller 
Cornell University 

1. “Fair” games. Let [Xk\ be a sequence of independent random variables 
with the same cumulative distribution function V(x). Suppose that the ex¬ 
pectation 

(1) E(X k ) - x dV(x) * M 

*-•0 

exists, and put 

(2) S n - Xi+ ••• + X n . 

The weak law of large numbers states 1 that for every c > 0 and n —► » 

(3) Pr {| S n -nM\ < en]-> 1. 

In the picturesque language of the theory of games this means that, after a 
large number of trials, the accumulated gain S n will, with great probability, be 
of the order of magnitude of nM. This led to the definition that a game is 
“fair” if the entrance fee for each trial is M. Unfortunately this definition 
creates the erroneous notion that a “fair” game is necessarily fair. To disprove 
it we shall (section 3) exhibit an example which will show: 

(I) A game can be “fair” and nevertheless such that the probability tends to one 
that , after n trials , the player will have sustained a loss L n = nM — S n of the order 
of magnitude n(log n)~ 1 ' 1 where rj > 0 is arbitrarily small. In other words , in our 
example 

(4) Pr {nM — S n > (1 — €)n(log n)“*| —> 1. 

Of course, L n is necessarily of smaller order of magnitude than n; however, our 
example can be modified in such a way that the ratio of the loss L n to the ac¬ 
cumulated entrance fees riM decreases as slowly as one pleases. 

This shows that a “fair” game can be exceedingly disadvantageous. Con¬ 
versely, an “unfair” game can very well be advantageous. If a careful driver 
insures his car, the game is clearly “unfair” according to definition, and yet some 

1 Usually (3) is proved only under more restrictive hypotheses. Actually the finiteness 
of E(Xh) implies even the strong law of large numbers; cf. Kolmogohoff, Orundbegriffe der 
Wahrscheinlichkeiterechnung (Berlin 1933), p. 59. 
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states impose such games on drivers. Now in this and many other practical 
cases the game is of such a nature that there is a very small probability p of 
winning a comparatively great amount A \ the “fair” price would be pA. In 
such cases the law of large numbers would be significant only if » is large com¬ 
pared to 1/p, whereas actually the maximum number of games to be played is 
comparatively small. Clearly any theory meets practical requirements only 
if it makes allowance for the number of trials and makes the “fair” price depend 
on the number of trials. 

2. The Petersburg “paradox*” For obvious reasons the classical theory of 
probability was unable to provide a precise formulation of the law of large 
numbers and to establish the actual conditions of its validity. Often it has 
been looked upon as a direct consequence of the definition of probability, and 
this led to the so-called Petersburg paradox which presents no difficulties to the 
modern theory. It refers to the ease where the expectation (1) is infinite. The 
usual example exhibits a game in which the possible gains in each trial are 
distributed according to 

(5) Pr {X = 2*} - 2“\ 

Here M = ce. Now the laV of large numbers (3) used to be proved (if at all) 
only assuming the existence of moments of higher order. Nevertheless, the 
classical theory postulated the validity of (3) even for M = °©, and treating 
as a number (with <*> — ao = 0) it argued that « isa “fair” price for the game 
as defined in (5). Great ingenuity was exercized in order to reconcile this 
result with commonsense. 2 Actually one can pass from (3) to the limit M —> oc, 
but the only result to be arrived at is trivial and could be anticipated without 
theory: If the player pays for each trial a fixed amount A, he is likely to have a 
positive gain provided he plays sufficiently long, i.e., provided n > N(A ), 
where N{A) itself increases with A. 

Instead of a paradox we reach the conclusion that the price should depend on 
n, that is to say vary as the number of trials increases. For best results this 
should be the case even if M is finite. It should be noticed that in the Petersburg 
case (5) a variable price can be determined so that a law of large numbers will 
hold which is in every respect analogous to (3). In this formula nM is simply 
the accumulated amount of entrance fees; denoting it by P n , formula (3) takes 
on the equivalent form 


* Among the latest textbooks, von Mises (Wahrscheinlichkeitsrechnung , Leipzig-Wien 
1931, p. 108f.) avoids the difficulty by declaring that (5) can not represent a collectif because 
of its infinite tail. This viewpoint is legitimate, but makes the law of large numbers inap¬ 
plicable to practically all useful distributions. Fry 0 Probability and its Engineering Uses , 
New York, 1928, p. 197) says: “The true explanation of the paradox is . . . based upon the 
fact that in our every-day experience we have to deal only with individuals who have finite 
fortunes and who would therefore be incapable of paying back the sums which are required 
. ..”. The problem does not seem to be mentioned in Uspensky’s book. 
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(6) * Pr {| $* — P n | < «P n } —* 1. 

It is this interpretation of (3) that leads to the notion of “fair” games. Now 
the Petersburg game can also be played in a “fair” way: 

(II) Let ihe player in the Petersburg game (5) at the k-th trial pay the amount 8 
log 2 k . The accumulated entrance fees up to the, n-th trial are P n ~ n logs n, 
and the game is “fair” in the sense that lhe law of large numbers (6) holds . This 
requirement determines the entrance fees essentially uniquely (that is to say up to 
terms of smaller order of magnitude which, by definition, remain undetermined). 

9 

3. Proofs. Theorems (I) and (II) follow easily from the following 
Lemma: Let a» —» » be a sequence of positive numbers ; in order that there exist 
a sequence [b n } such that 

(7) Pr {| — b n \ < «a„} —* 1 

it is necessary and sufficient that for every 8 > 0 simultaneously 



in this case ( 8 ) will hold with 


(9) K = E [ X dV(x) 

fc-1 «/|*|<a* 

(and, of course , for any other sequence { 6 *} if and only if\b* — b n \ — O(an)). 
This lemma is a simple consequence of the necessary and sufficient conditions 
for the generalized law of large numbers 4 . 

To prove theorem (II) we have to determine a sequence {a n } such that (7) 
will hold for the distribution function defined in (5) and with b n ~ a n . A simple 
computation shows that ( 8 ) will hold for any sequence {a n \ which increases 
faster than n. Moreover, the sequence \b n \ defined by (9) will be of the same 
order of magnitude as ja n ) if, and only if, a« ~ n log 2 n. This proves (II). 

Now let ri > 0 be arbitrary, and define the distribution function F(x) to have 
a density 


( 10 ) 

at x 

(id 


V'(x) 


X log ” X 

0 the function V(x) shall have a jump of magnitude 


-r? 


17 dx 

log 1+ *x < 


for x > e; 


while V(x) is constant in the intervals x < 0 and 0 < x < e. For this distribu¬ 
tion function we have obviously M * 1 . 

9 Logi stands for the logarithm to the basis 2. 

4 Cf. Feller, Acta Univ . Szeged y Vol. 8 (1937), pp. 191-201. 



304 


GERHARD TINTNER 


Next, let for n > e 

(12) a n * n log"’ n. 

Then (8) holds and from (9) and (10) we obtain easily for large n 

(13) b, = 2 fl - log-’a*} < rt - (1 - «)a» . 

fc -1 

Substituting into (7) one sees that, again for sufficiently large n, 

(14) Pr {Sn - n + (1 - €)On < €O n ) -> 1, 
or, since M = 1, 

(15) Pr {&> - nitf < -(1 - 2€)a n } — 1. 

This proves (I). 


A NOTE ON RANK, MULTICOLLINEARITY AND MULTIPLE 
REGRESSION 1 

By Gerhard Tintner 
Iowa State College 

Let Xu(i = 1, 2 • • • M) be set of M random variables, each being observed at 
t 1, 2 • • • N. X it = Mu + yu» (This is essentially the situation envisaged 
by Frisch [1]). The systematic part of our variables Mu = EX it . The y it are 
normally distributed with means zero. Their variances and covariances are 
independent of t. The M it and y it are independent of each other. Define 
Xt = JZtXit/N the arithmetic mean of Xu and Xu = Xu — Xi the deviation from 
the mean. Then = '2, i xuXji/{N — 1) gives the variances and covariances 
of the observations. We want to determine the rank of the matrix of the 
variances and covariances of Mu - 

Now assume that 11F< y| | is an estimate of the variance-covariance matrix of t he 
error terms or “disturbances” yu . The elements of this matrix are distributed 
according to the Wishart distribution and are independent of the Mu . They 
can be estimated as deviations from polynomial trends, as deviations from 
Fourier series, by the Variate Difference Method, etc. The estimates could also 
be based upon a priori knowledge if for instance the yu are interpreted as errors 
of measurement. Assume that the estimate is based upon N 9 observations. 

1 The author is much obliged to Professors W. G. Cochran (Iowa State College), H. 
Hotelling (Columbia University), T. Koopmans (University of Chicago) and A. Wald 
(Columbia University) for advice and criticism with this paper. He has also profited by 
reading the unpublished paper: “On the Validity of an Estimate from a Multiple Regression 
Equation” by F. V. Waugh and R. 0. Been which deals in part with a problem related to 
the one presented here. 

Journal Paper No. J-1S23 of the Iowa Agricultural Experiment Station, Amee, Iowa. Project No. 730. 
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Form the determinantal equation: 

U) | a.-y — XFiil = 0. 

Apart from sampling fluctuations there should be r solutions X ** 1 of equation 

(1) if there are r independent linear relationships between the Mu . The rank 
of the variance-covariance matrix of Mu is then M — r. Following a suggestion 
of P. L. Hsu [2] made on the basis of the earlier work of R. A. Fisher [3] we form, 
the test function 

(2) Ar - (W - 1) (X| + X 2 * • * + x r ), 

where Xi is the smallest root of (1), X 2 the next smallest, etc. Hence (2) is the 
sum of the r smallest roots of equation (1). The hypothesis to be tested is that 
there are exactly r independent linear relationships between the systematic 
parts of our variables in the population. This quantity (2) is distributed like 
X 2 with r(N — M — 1 + r) degrees of freedom for large samples, i.e. if N f be¬ 
comes large. It can be used for forming an opinion about the number of inde¬ 
pendent relationships existing among the systematic parts of our variables 
Th$ importance of the question of the rank lies in the following: Sometimes 
we are not so much interested in making predictions as to estimate the “true” 
relationships which exist in the population which corresponds to our sample 
(Wald) [4]. Practically speaking, these relationships and their estimation are 
of great importance in economic statistics, as Haavelmo has shown [5]. But a 
knowledge of the rank i.e. the number of independent relationships existing be¬ 
tween the systematic parts of the variables may also be of some significance for 
the problem of prediction. The inclusion of strongly correlated predictors 
cuts down on the number of degrees of freedom without contributing significantly 
to the reduction of the variance. 

The remainder of this paper w T ill be concerned with an attempt to estimate 
the relationships which in the population exist between the systematic parts 
of the variables. This is an extension of the work of T. Koopmans [6] and the 
author [7] who dealt with the special case in which there is only one relationship 
between the systematic parts. 

Suppose that we decide that there are R independent relationships among the 
systematic parts of our variables 

(3) k vQ -I- X kvjMjt = fvt “ 0; v = 1 , 2, • • •, R, t » 1 , 2, • • •, N. 

» 

We desire to obtain estimates of these relationships. Our purpose here is not 
prediction but estimation of the structural Coefficients k v j . >' 

The method of maximum likelihood leads to the method of least squares if we 
treat the Vij as constants. This is again permissible if A' is large and our esti¬ 
mates of the T 7 jj become reasonably accurate. We have to minimize the follow¬ 
ing sum of squares 


( 4 ) 


Q - 
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where 

(8) Qt v is (x it - m ie )(ac >( - nijt), 

where || V %9 1| = || Vij || the inverse of the variance-covariance matrix of the 
errors. We also define m,** * M it — $fi, (t — 1, 2, • * •, N) where Kti is the 
mean of Mu . 

If there are R relationships (3) they can be written by using only R(M — R) 
coefficients k v ](j = 1, 2 • • • M), if we disregard the constant terms fco„, because 
we are now dealing with deviations from means. We can for instance express 
the first (M — R) variables mu in terms of the last R variables mu . Hence, 
we have to impose R 2 conditions upon the MR coefficients k V j(j = 1,2, • • •, M) 
appearing in (3). 

We impose R(R + l)/2 conditions as follows 

( 6 ) . Z*i2jk vt k W jVij = g vw = 8 VU >, 

where 8 VW is a Kronecker delta. These conditions orthogonalize and normalize 
the coefficients k v j . We have now to adjust the Q t as given in (5) un^ler the 
conditions (6) by determining appropriate m,,. This is a problem of re¬ 
stricted minima. 

We introduce a new function 

( 7 ) F t — Qt — Hvtfvt , 

V 

where the p v t are Lagrange multipliers. Differentiating with respect to mu and 
setting equal to zero we get the solution: 

(8) Z ~ m )t ) (* = 1, 2, • • •, M ); 

J V 

or, solving for x it — m it 

(9) Xu mu * X) Vi] kvj j i ~ 1, 2, * • *, M» 

v J 

Multiplying (9) by k vi and summing we get 


(10) 

m.i “ 2 jk V jXjt • 

Hence we have 


(11) 

Qt = Z & = Z kvjXj t y. 

Now we dispose of the remaining R(R — l)/2 conditions 


(12) 2 * Aw * 0, v ^ w. 

t 

We have to maximize Q under the R 2 conditions (6) and (12). This is done 
by finding the appropriate k vj . 

We form a new expression 
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(13) 0 - Q + 2.2*0,*** - 2,2*o,*0,* 

where the a vw and Aw (v w) are again Lagrange multipliers and A* « 0. 
Because of considerations of symmetry we have: a vw = a wv and A* = 0*, , 
Differentiating with respect to k v i and setting equal to zero we get the condition 

*' i 

2*(2y kvjXjt)Xa *1“ 2|9 @wD^t(EjJc W jXjt)Xit 

(14) 

~ 2* ot vw 2/ V\jk w j , ^ ~ 1 » 2, ■ • •, R, i = 1,2, • • ■ , M, 

Multiplying by k vi and summing we get 

(15) 2tnlt — a vv . 

Multiplying by k gi (z ^ v) and summing we have 

(16) A*2,mL ■ «« (v 9* z). 

Both (15) and (16) follow from conditions (6) and (12). 

Exchanging the role of v and z in (16) we have also 

(17) A= a vt (v z) m 

Hence we have a vs — ff„ = 0, if v ^ w. Inserting these results in (14) we get a 
system of linear and homogeneous equations in the unknown coefficients k v j . 
The determinant of the system must be equal to zero in order to yield non-trivial 
solutions. Trivial solutions are not admitted because of (6). Hence the a vv 
are simply the roots k of the equation | 2 ( XuXj< — kVa | = 0. 

Introducing 

(18) X, - a vv /(N - 1), 

expression (14) becomes actually the determinantal equation (1). This expres¬ 
sion can be used to find the R smallest latent roots A„ and the corresponding 
characteristic vectors k t} by Hotelling’s methods [8]. 

The constants of the equation (3) are finally determined by the condition 
that, the optimum solutions have to go through the means of the variables 

(19) k v o + Xjk V j%j = 0. 

The distribution of the variances and covariances of the observations has recently 
been established by T. W. Anderson and M. A. Girshick for the cases R = 
M — 1 and R = M - 2 [91. 
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NOTE ON THE DISTRIBUTION OF THE SERIAL 
CORRELATION COEFFICIENT 1 

By William G. Madow 
Bureau of the Census 

The distribution of the serial correlation coefficient when p = 0 has been 
previously obtained. 2 * 4 * * * The purpose of this note is to derive the distribution of 
the serial correlation coefficient, using the circular definition, when p ^ 0. 

Let us assume that the random variables X \, • • • , x N have a joint normal 
distribution 8 p(x i, • ■ • , x# | A, B } p) where 

jOg p(xi , • • • , x* | A, B, m) 

- log - i [A Z (*< - m) 2 + 2B Z (x< - „)(x i+L - m)] 

the term in the bracket is positive definite, Ki is independent of the x» and if 
i + L > N then x i+L = Xi+ L -s . It is then clear that x, V N , and L C N , where 
ft, is the arithmetic mean, Vn = 2) 2 and 

i 

tCy - Z (*< - £)(*«+! “ 2) 

t 

are sufficient statistics with respect to the estimation of p, A, and B. 

Let Vs lRn = lCn define lRn , the serial correlation coefficient. Then if 

1 Presented at a meeting of the Cowles Commission for Economic Research in Chicago, 
January 31,1945. 

* See R. L. Anderson, “Distribution of the serial correlation coefficient” y pp. 1-13 and T. 

Koopmans, “Serial correlation and quadratic forms in normal variables”, pp. 14-33, Annals 
of Math. Stat. t Vol. XIII, No. 1, March, 1942. 

4 The expression p(6i , • • • , £» 1 9i , • • • , $ g ) means the probability density or the 

distribution of the random variables £i , • • * , £« for the given values of the parameters 

> * • * , 0 9 • When used as an index of summation or multiplication, the letter i will 

assume all values from 1 through N. 
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A * 1, B « 0 Anderson has shown 4 that, if iV is odd, the joint distribution oif 
i Rn and Vs is given by 

(1) D(R„ , V K ) - KV^ e~ iVlf £ (X< - Rsf-V/at, for Wi £ £ X« 


2-fc 

R>r = iRti, X* >= cos a< “ II (X< — X,), for all j * i 

and K~ l = 2* <w_l) r[i(JV - 3)]; while if iV is even, the same formula holds exeept 
that 

i(y-2) 

on ■* 11 (X< — Xy) V(X« + 1), for all j s* t. 

We now extend Anderson's distributions to the case where it is not assumed 
that A « 1 and B = 0. 

As a means of extending 6 Anderson’s distribution let us recall that if x± , • • • , 
x N have a distribution p(x x , • • • , x# | 0i, • • • , $ 9 ) depending on several param¬ 
eters 0i, • • • , 6 g , and if zi , • • • , z k are a sufficient set of statistics with respect 
to 0i, • • • , 0 a , i.e. 

p(x , • ■ • , Xs | 0i, • • • , 0,) * h(zi , - " i ** | 0i, - - , 6 g )m(x \, • • • , x N ) 

where m(x i, • • • , x*r) is independent of 0i, • • • , 8 g , then if the distribution of 
Z \, • • • , Zk is found, assuming 0 X , • • • , 0, have specific values 0?, * • * , 0j , 
then it follows that 

p(^i, • • •, Zk 10i, * * • > 0a) * p(*i > • * ■ i Zk 10?, • • • Og) ^ Zk\(f{ • • • 0°) * 

We may call Anderson’s distribution given in (1), p(Rs , V N | 1, 0), i.e. 
pCftnFjrll.O) = D(R n , V N ) 

Furthermore, x is distributed independently of R N and V N for all values of A 
and B and hence by a simple transformation,® we can apply the above theorem. 

4 Anderson loc. cit. p. 3 and p. 5. Although the remainder of the note deals only with 
the case where L *» 1 the procedure is general and may be easily carried through for other 


1 See W. G. Madow Contributions to the "Theory of multivariate statistical analysis”, 
Trans, of the Amer. Math. Soc ., Vol. 44, No. 3, November 1938, p. 461. 

• For a proof that an orthogonal transformation of the variable m exists such that 
Vs and lCs are simultaneously reduced to_canonical forms involving the same N — 1 of 
the variables of the transformation, and VN (£ — m) is the Nth variable of the transforma¬ 
tion, see J. von Neumann, "Distribution of the ratio of the mean square successive differ¬ 
ence to the variance, Annals of Math. Stat., Vol. XII, No. 4, December 1941, pp. 368, 369. 
The proof there is given for Vs and S(ar< — Si+i)* but is easily extended to this case. 

Then it is easy to show that N(t — n) is independently distributed of Vs , and lCs and 
has distribution log p[VN{$ — m) I A , B] — log K% - $[A + 2B\N(£ - m)i where K t •» (2t)~* 
(A + 2£)* and K[K t » K\. 
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Then 


p(R„ ,Vh\A,B) = p(A*, V„ 11, 0)0 

Where 

ir’p-iUVs+WNVN) 

O = _ 

Hence it follows that, 

p(«*. F w | A, B) = A^(2T) iw Ft (w - 3) e-‘^ u+5S *" ) £ (X, - R N ) i{N ~ 6) / ai , 

»—1 

for X m+i < Rn < \ m f where the a* have different values according to whether AT 
is odd or even. In order to evaluate p(Rn | A, B) we then need only integrate 
out Vjv . Now 

jf e -»^(A+2»*A) dVfr = r[i(A r _ i)](A/2 + ARa)' 1 '" -11 . 

Hence 

p(Ba |A,B) * KK[{2rf N T[i(N - 1)](A/2 + BR*)^" £ (X, - R^ M ~ l) / ai . 

The parameters K [, A and # depend on the different types of assumptions that 
may be made. In general 

Ax = (2T)~ iN A w 

where A is a circulant (ai, • • • , a N ) such that 

Oi = A, ai+L = B , ai +( Ar-L) = B , a, = 0 otherwise, 

and hence 

A = n (a + B cos = n (A + B\i). 

Then, one assumption is 

A = 4, B = -pA 1 

where p is the ‘‘true” serial correlation coefficient. Other assumptions are 
possible. 7 However, these vary with the problem under consideration and may 
be left for further examination. 

7 One possible alternative definition is given by W. J. Dixon, “Further contributions to 
the problem of serial correlation”, Annals of Math. Stat ., Vol. XV, No. 2, June 1944, p. 120, 
equation (2.1). 
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NOTE ON A ^APER BY C. W. COTTERMAN AND L. H. SNYDER 

By H. B. Mann 1 
Ohio State University 


C. W. Cotterman and L. H. Snyder [1] gave a method to test simple Men- 
delian inheritance in randomly collected data. From a population assumed to 
be at equilibrium a sample is taken. The number of homozygous recessives in 
the sample is known. We wish to estimate the number of heterozygous individ¬ 
uals in the sample. 

Let a be the proportion of recessive genes among all genes in the population; 
t, p, r the proportion in the population of homozygous recessives, heterozygous 
and homozygous dominant individuals respectively and p, r, t the sampling 
values of it, p, r. Then 

(1) t = e? 9 p = 2a(] — a), r = (1 — at) 2 , p + r+(« 1, 


Cotterman and Snyder use as an estimate of r the quantity 2y/p{\ — y/p). 
It is the purpose of this note to show that this estimate is for all practical purposes 
equivalent to the maximum likelihood estimate of r. 

The joint distribution of p , r and t in samples of n is given by 


( 2 ) 


p(n r f \ = n!,r"V r T"‘ nla 2np l2a(l - a)] nr ( 1 - a) 2nt 
5 ' (np)l(nr)l(nt)l (np)l(nr)l(nt)l 


where P(p , r, t) is the probability of obtaining the values p, r, t in samples of n. 
We wish to maximize P(p, r, t) for fixed values of p with respect to a and r. 
Maximizing first with respect to a one easily obtains 


(3) 2<* = 2 p + r. 

We can regard a as a continuous parameter and hence (3) must hold at any 
maximum of P(p, r, t). For any maximum of P(p, r, l) we must further have 

n\r np p nr r nt __ > n\T np p nr * l T nt ' 1 
{np)\{nr)\(nt)\ ' (np)l(nr + 1 )l(nt — 1)! 

and 


nlr np p nr r nt > n\ir nv p nr ~ l r nt ~ l 

(np)!(nr)!(nf)! ^ (np)!(nr — l)!(nt + 1)!’ 


This leads to the inequalities 


(4) 


-L>-J!_, 

nt nr + 1 nr nt + 1 


Substituting / ==1—p — r, r=l —t — p one easily obtains from (4) 
1 Research under a grant of the research foundation of Ohio State University. 
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( 6 ) 


fin - pnp + p v v pn - pnp — r ‘ 
n(l — r) ' ' n(l — *■) 


The difference of the two bounds is -. Hence r must satisfy an equation 


r = 


fin — pnp + p « 


0 £ * £ 1. 


n(l — t) n’ 

Substituting the values for p, » and r from (1) and (3) we obtain 


“ — i (1 - */2) 

n 


P + 


2n 


0, 


2 - 
4 n 




( 2 - *) 2 

4n 2 


+ 4p 


Since 0 1 we obtain from (3) 


2e 
n ’ 


(6) i + Z^ + i-^^r »i. + 1 /4 P+ i-|-2 P . 

From (6) we see that for all practical purposes we may use the estimate 

r = 2\/p(l - \/p). 


REFERENCE 

[1] C. W. Cottebman and L.H. Snyder, “Tests of simple Mendelian inheritance in ran¬ 
domly collected data of one and two generations,” Jour. Am. Stat. Assn., 
Vol. 34 (1939), pp. 511-523. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute new items of interest 

Personal Items 

Dr. R. G. D. Allen, who has been associated with the Combined Production 
and Resources in Washington has returned to the London School of Economics. 

Dr. Kenneth J. Arnold, who has been doing war research work with the 
Columbia University Statistical Research Group has returned to his position 
at the University of Wisconsin. 

Dr. Lee A. Aroian, on leave from Hunter College is serving as a research 
associate in the Applied Mathematics Panel Project at Berkeley, California 
under the direction of Professor Neyman. 

Dr. Ernest E. Blanche, has been appointed to the teaching staff of the Army 
University organized by the War Department for American veterans at Florence, 
Italy. 

Assistant Professor Z. W. Bimbaum of the University of Washington has 
been promoted to an associate professorship. 

Dr. Alva E. Brandt has returned from the Operational Research Section of 
the Ninth Air Force in Europe. 

Associate Professor R. S. Burington of the Case School of Applied Science 
has received the Meritorious Civilian Award from the United States Navy. 

Dr. Irving W. Burr has been promoted to an associate professorship at Pur¬ 
due University. 

Miss Frances Campbell, after receiving her doctorate at Michigan in June, 
has returned to her position at George Pepperdine College, Los Angeles. 

Professor Harry C. Carver, after a year of service with the Army Air Forces, 
has returned to the University of Michigan. 

Professor W. G. Cochran has returned to Iowa State College from a special 
mission to Germany. 

Professor Churchill Eisenhart, who has been doing war research work with 
the Columbia University Statistical Research Group, has returned to the 
University of Wisconsin. 

Miss Mary Elveback has been appointed to an assistant professorship at 
Rockford College. 

Assistant Professor C. H. Fischer of the University of Michigan has been 
promoted to an associate professorship. 

Mr. Elvin A. Hoy, who has spent three years with the War Production Board, 
is now Chief of the Statistics Section of the Bureau of Research and Statistics 
of the Social Security Board. 

Professor P. L. Hsu of Kunming, China, has been appointed to a visiting 
professorship of statistics at Columbia University, beginning January 1946. 

Dr. Doncaster G. Humm has received an honorary Doctor of Science degree 
at Bucknell University. 


313 



314 


NEWS AND NOTICES 


Mr. Joseph M. Juran who has served during the war with the Foreign Eco¬ 
nomic Administration, is now Chairman of the Department of Administrative 
Engineering at New York University. 

Dr. Eugene Lukacs has been appointed Professor and Head of the Mathe¬ 
matics Department at Our Lady of Cincinnati College. 

Dr. R. v. Mises of Harvard University has been appointed to a professor¬ 
ship of aerodynamics and applied mathematics. 

Professor A. M. Mood has returned from Princeton University to his position 
at Iowa State College. 

Assistant Professor Henry Scheflte of Syracuse University has been granted 
leave of absence to serve as senior mathematician with Princeton University 
Station of Division 2 of NDRC. 

Symposium at the University of California 

A Symposium on Mathematical Statistics and Probability was held at the 
University of California at Berkeley on August 13-18, 1945. Those partici¬ 
pating in the symposium as speakers or chairmen were: 

Dean G. P. Adams, Prof. K. B. Babcock, Prof. E. M. Beesley, Prof. B. A. Bernstein, Prof. 
Egon Brunswik, Prof. A. H. Copeland, Prof. P. H. Daus, Lt. Comm. F. W. Dresch, Prof. 
G. C. Evans, Miss Evelyn Fix, Prof. Harold Hotelling, Prof. Victor F. Lenzen, Prof. Jay L. 
Lush, Prof. J. H. McDonald, Prof. George F. McEwen, Prof. J. Neyman, Prof. G. Polya, 
Prof. Hans Reichenbaeh, Prof. A. C. Schaeffer, Prof. Morgan Ward, and Dr. Jacob Wolfo- 
witz. 


New Members 

The following persons have been elected to membership in the Institute: 

Abbey, Helen, M.A. (Michigan) Stat., Bur. of Records & Stat. Mich. Dept, of Health, 916 
N. Chestnut, Lansing , Michigan. 

Acton, Forman, Ch. E. (Princeton) T/4 Army of the U.S., SED Barracks Area, Oak Ridge, 
Tenn. 

Altchison, Beatrice, Ph.D. (Johns Hopkins) Econ. & Stat. Analy., I, CC. 1929 S. St., 
N.W. Wash., 9, D. C. 

Aimer, George, A.B. (Western Reserve) Stat. Ohio High. Plan. Sur., 576 So. 18thSt. % 192 
Arlington , Va. 

Bartlett, Maurice, D.Sc. (London) Univ. Lecturer, Cambridge, 137 Chesterton Road, Cam¬ 
bridge , Eng. 

Berwick, Leo, A.B. (New York Univ.) Capt., A. C. Asst, to Surgeon Stat. Unit of Psych. 
Sect. Hq. AFTRC, T & P Bldg., Fortworth 8, Texas. 

BlackweU, Asst. Prof. David, Ph.D. (Illinois) Math. Dept. Howard Univ. Wash., D. C. 

Borland, James, M.A. (Indiana) Capt., Ex. Officer, Inspect. Office, Pine Bluff Arsenal , 
Ark. 

Brown, Prof. Theo., Ph.D. (Yale) Bus. Stat. Harvard Bus. School, Soldier’s Field, Boston 
63, Mass. 

Bunke, Alfred, M.A. (Columbia) Sen. Stat. N. Y. State Dept, of Labor, 37 Parkwood St. 
Albany 3, N. Y. 

Burington, Asso. Prof. Richard, Ph.D. (Ohio) On leave from Case School of Applied Sci¬ 
ence, Cleveland, Ohio, at Present, Head Math., Bu. Ord. USN 5800 N. Carlin Spring 
Rd., Arlington, Va. 
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Campbell, James Ph.D. (Edinburgh) Univ. Math. Lecturer, Victoria Univ. CoU. Well, 
W.I. New Zeal. 

Churchill, Edmund, A. M. (Columbia) 1686 Union Port Road, New York 2, N. Y. 

Cornfield, Jerome, B.S. (New York Univ.) Stat. Dept, of Labor, R.F.D. Hit Herndon, 
Va. 

Cruden, Dorothy, A.B. (California) Stat. in Sampling Sect. Spec. Sur. Div. Bur. of Census 
% Pop. Div. Wash., D. C. 

Daniel, Cuthbert, M.S. (Mass. Inst. Tech.) Stat. Eng., Carbide and Carbon Chem. Corp., 
460 East Drive, Oak Ridge, Tenn. 

David, Florence, Ph.D. (London) Univ. Sect. Stat. Dept. Univ. Coll., London, W.C. 1, 
England. 

De Garis, Prof. Charles, Ph.D. (Johns Hopkins) Univ. of Okla. School of Med., Okla. City, 
4. Okla. 

Echegaray, Miguel, C.E. Ag. Attache to the Spanish Embassy, 2700 16th St. N.W. Wash., 
D. C. 

Ede, Richard, B.S. (Wisconsin) Chemistry Devel. Metallurgist, Gary Works, Car. Steel 
Ill. 647 Fillmore St., Gary, Indiana. 

Ewart, Robert, A.B. (New York Univ.) Research Physicist, Ballistics Dept. Des Moines 
Ord. Plant 688-46th St. Des Moines 12, Iowa. 

Federer, Walter, M.S. (Kansas State) Research-Ag. SteU. Slat. Lab., Iowa State CoU. 
Ames, Iowa. 

Freeman, Richard, B.Sc. (McMaster) Research Chemist. 1 Maple Ave., Hamilton, Ontario, 
Canada. 

Goldrosen, David, B.S. (Worcester Poly Inst.) Lt. USNR Quality Control Officer, Insp. 
of Naval Mat’l. 204 Ward St. Newton Centre, Mass. 

Goodman, Albert, Supervisor Stat. Control, Quality Control, Westinghouse Elec. Corp., 
Essington, Pa. 

Grant, Asst. Prof. David, Ph.D. (Stanford) Dept, of Psych., Univ. of Wis., Madison 6, 
Wisconsin. 

Greenhouse, Samuel, B.S. (City Coll. N. Y.) T/4 U.S. Army, 5815-lSth St. N.W. Wash., 

11, D. C. 

Gretton, Owen, A.B. (Brown) Acting Chief, Ind. Div. Sen. Econ., 10167 Old Bladensburg 
Road, Silver Spring, Maryland. 

Hayden, Byron, A.B. (Geo. Wash. Univ.) Econ. Stat. A. A. F. Wash. D. C. 1801 S. Cleve¬ 
land St., Arlington, Va. 

Hecht, Bernard, B.E.C. (City Coll, of N. Y.) Tfsgt, 616 Corp., Army-Navy Electronics 
Stand. Agency 42 Washington Village, Asbury Park , N. J. 

Haufek, Lyman, M.B.A. (Northwestern) U. S. Army Hq. ASF, Chief Supply Stat. Unit, 
1121 New Hampshire Ave., N.W., Wash. 7, D. C. 

Kampschaefer, Margaret, A.B. (Indiana) Stat. Bur. of Labor Stat. 1087 E. Blackford 
Ave., Evansville, 18, Indiana. 

Kozakiewica, Wadaw, Ph.D. (Warsaw) Inst, in Math., Univ. of Saskatoon , Saskatoon, 
Canada. 

Laguardia, Prof. Rafael, Director of Math & Stat. (Univ. of Uruguay) Fine Hall, Prince¬ 
ton Univ., Princeton, N. J. 

Leighton, Walter, Ph.D. (Harvard) On leave at Northwestern as Director, Applied Math. 
Group (NORC) Lecturer in Math. The Rice Inst. 1704Judson Ave., Evanston, IUinois . 

Lieblein, Julius, M.A. (Brooklyn Coll.) Econ. Anal. Room 4013, U. S. Trea. Dept., 16th 
<fe Penna. N.W. Wash. 26, D. C. 

Lien, Roy, M.S. (Oregon State) Rate Stat., Northwestern Elect. Co., Portland, Oregon, 
8121 S.E. Division St., Portland 2, Oregon. 

Lonseth, Asst. Prof. Arvid, Ph.D. (California) Math. Dept. Northwestern University , 
Evan., III. 
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Mlohmlup, Eric, Ph.D. (Univ. of Vienna) Math A Astronomy Actuary , Apartado 848, 
Caracas, Venezuela. 

Monro, Sutton B.S. (Mass. Inst. Tech.) Head of Str. Staff Unit. Amm. Div. Naval Ord. 

Lab. Lt. USNR 8488 Martha Custis Dr. Alexandria, Va. 

Nllson, Hugo, Ph.D. (Minnesota) Chemist in Charge Fishery Tech. Lab. U. S. Fish A 
Wildlife Serv. College Park, Maryland. 

Nichols, Russell, B.A. (DePauw) Sergeant , U. S. Army Co. A. 586 A 1 , Kn APO 655, 
NYC (88-745-907). 

O’Neil, Frank (Lowell Textile Inst.) Senior Textile Technician, Worsted Division, Pacific 
Mills, Lawrence, Mass. 

Rappaport, Gladys, B.A. (Hunter) Jr. Stat. Slat. Research Group, Columbia, Univ., 
8180 Tiehout Ave., Bronx 57, New York. 

Rice, Assoc. Prof. Nelson, Ph.D. (C. U. of A.) 8886 18th St. N.E., Wash., 17, D. C. 

Schell, Rmil, M.A. (Western Reserve) Stat. Employment Stat. Div. 8440 N. 18 Hd. 
Arlington, Va. 

Schneberger, Richard, (Cert, to teach in Tech High School Training for Industry State 
Programs) %Edison Gen. Elec. Appl. Co., 5600 W. Taylor St., Chgo., III. 

Simon, Geo., Ed. M. (Harvard) Capt., A. C. Avia. Psych. Psych. Section, Surgeon, Ilq. 
AFTRC, Ft. Worth 8, Texas. 

Spaulding, Asa, M.A. (Michigan) Actuary & Asst. Sec. No. CarolinaMut. Life. Ins. Co. 
Durham, North Carolina. 

SpoerL, Charles, B.A. (Harvard) Asst. Treas. % Aetna Life Ins. Co. Hartford, Conn. 
Springer, Wm., C.E. (Columbia) Asst. Vice Pres, in charge of Research, Bristol-Myers Co. 
Hillside 6, New Jersey. 

Stock, J. Stevens, M.A. (American) Lt. USNR, Hd. Stat. Sect. Div. of Shore Est. & 
Civilian Per. Navy Dept., 8608 Garfield St., Bethesda, Maryland. 

Stott, Alex, A.B. (Harvard) Lt. Comdr. IJSNR, 8800 Devonshire PI ., N.W., Wash. 8, D. C. 
Taylor, Thomas, Ph.D. (Yale) Research Engineer, U. S. Testing Co. 45 Grover Lane , Cald¬ 
well, N. J. 

Treanor, Glen, B.A. (Minnesota) Principal Tax Economist, Bus. & Ind. Research Div., 
Income Tax Unit, Bur. of Int. Rev., Room 8888, Wash., D. C. 

Wherry, Robert, Ph.D. (Ohio State) On leave Dept. Psych. Univ. of N. C., Civilian Head, 
Stat. Anal. Unit AGO Personnel Research Section, 870 Madison Ave., N. Y . 

Wilcoxon, Frank, Ph.D. (Cornell) Group Leader, Insecticide & Fungicide, La., Amer. 

Cyanamid Co., Stamford, Conn. R.D.ftl Box 89a, Riverside, Conn. 

Wolff, Marlon, A.B. (Hunter) Asst. Math. Stat. Stat. Research Group Div. of War 
Research Columbia University 1784 Crotona Park East, New York 60, N. Y. 

Unknown Addresses 

Recent mail has not been delivered to the following members of the Institute 
at the addresses listed. If anyone knows of the current address of one or more 
of these members, please notify the Secretary-Treasurer at once. 

Lt. (jg) Gordon L. Beckstead—Aer. Navy 151 % Fleet Postmaster, San Fran., Cal. 

Dr. Charles Wm. Cotterman—637 Hawthorne Road, Winston Salem, North Carolina 
Mr. James Davidson—Box 344, Christiansburg, Virginia 

S/sgt George Elmstrom—Det. of Pat., Hospital Plant. 4176 APO % PM, NYC, N. Y. 

Mr. Henry Goldberg—401 W. 118th St. New York 27, New York 

Mr. Henry Hebley—Box 166, Pittsburgh 30, Pennsylvania 

Mr. John Mandel—45 Kew Gardens Road, Kew Gardens, New York 

Mr. David F. Votaw, Jr., USNTC—Bainbridge, Maryland 

Mr. Edward F. Wilson—Keswick Colony, Keswick Grove, New Jersey 



REPORT ON THE RUTGERS MEETING OF THE INSTITUTE 

The Eighth Summer Meeting of the Institute of Mathematical Statistics 
was held at the New Jersey College for Women, Rutgers University, New Bruns¬ 
wick, New Jersey on Sunday, September 16, 1945, where the Summer Meeting 
of the American Mathematical Society was also being held. The following 
115 members of the Institute attended the meeting: 

C. B. Allendoerfer, R. L. Anderson, T. W. Anderson, H. £. Arnold, I. L. Battin, Archie 
Blake, C. I. Bliss, P. Boschan, A. H. Bowker, A. E. Brandt, G. W. Brown, R. H. Brown, T. 
H. Brown, T. A. Budne, R. S. Burington, B. H. Camp, A. G. Carlton, P. C. Clifford, E. P. 
Coleman, T. F. Cope, G. M. Cox, H. B. Curry, J. H. Curtiss, J. F. Daly, J. H. Davidson, B. 
B. Day, W. E. Deming, H. F. Dodge, Jacques Dutka, P. S. Dwyer, Churchill Eisenhart, 
Wade Ellis, Mary Elveback, Benjamin Epstein, C. D. Ferris, C. H. Fischer, M. M. Flood, 
R. M. Foster, Milton Friedman, J. P. Gill, M. A. Girshick, Casper Goffman, A. A. Goodman, 
Dorothy K. Gottfried, T. N. E. Greville, F. E. Grubbs, K. W. Halbert, Marshall Hall, P. 
R. Halmos, Miriam S. Harold, Millard Hastay, Bernard Hecht, William Hodgkinson, I. S. 
Hoffer, Harold Hotelling, A. 8. Householder, W. Hurwicz, Irving Kaplansky, C. J. Kirchen, 
Jack Laderman, Rafael Laguardia, H. G. Landau, Howard Levene, Harriet Levine, S. B. 
Littauer, A. T. Lonseth, P. J. McCarthy, W. G. Madow, J. W. Mauchly, E. B. Mode, D. J. 
Morrow, J. E. Morton, Judith Moss, P. M. Neurath, M. L. Norden, H. W. Norton, C. O. 
Oakley, P. S. Olmstead, Edward Paulson, John Riodan, H. E. Robbins, H. G. Romig, William 
Salkind, M. M. Sandomire, Arthur Sard, F. E. Satterthwaite, L. J. Savage, Henry Scheffc, 
Bernice Scherl, Edward Schrock, I. E. Segal, C. E. Shannon, L. W. Shaw, Herbert Solomon, 
Mortimer Spiegelman, J. R. Steen, Arthur Stein, F. F. Stephan, A. P. Stergion, L. V. 
Toralballa, Mary N. Torrey, A. W. Tucker, L. R. Tucker, J. W. Tukey, Helen M. Walker, 
W. A. Wallis, R. M. Walter, B. T. Weber, Joseph Weinstein, A. E. R. Westman, Frank Wil- 
coxon, S. S. Wilks, Jacob Wolfowitz, C. P. Winsor, Ruth Zwerling. 

The first session, on Sunday morning, was devoted to a symposium on Se¬ 
quential Analysis. Professor W. Allen Wallis, of Stanford University and Colum¬ 
bia Statistical Research Group, acted as chairman for this session. The fol¬ 
lowing invited addresses were given. 

1. Theory of Sequential Analysis. 

Professor A. Wald, Columbia University and Columbia Statistical Research Group. 

2. Construction of Multiple Sampling Inspection Plans for Attributes from Sequential 
Principles. 

Mr. Milton Friedman, National Bureau of Economic Research and Columbia 
Statistical Research Group. 

3. Applications of Sequential Analysis to the Ranking of Tioo Populations with Respect to 
a Single Parameter. 

Mr. M. A. Girshick, Bureau of Agricultural Economics and Columbia Statistical Re¬ 
search Group. 

The morning session was concluded after lively discussion on the symposium 
topic. 

Dr. W. Edwards Deming, of the Bureau of the Budget and President of the 
Institute, presided at the afternoon session. The following papers were pre¬ 
sented : 
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1. On The Variance of a Random Set in n Dimensions. 

Dr. Herbert E. Robbins, Post Graduate School, Annapolis. 

2. The Non-Central Wishart Distribution and its Application to Problems In Multivariate 
Statistics. 

Dr. T. W. Anderson, Jr., Princeton University. 

3. The Effect on a Distribution Function of Small Changes in the Population Function. 

Professor Burton H. Camp, Wesleyan University. # 

4. On Composite Distributions. 

Dr. Casper Goffman and Dr. Benjamin Epstein, Westinghouse Electric Corp* 

5. Population , Expected Values and Sample. 

Professor Emil J. Gumbel, New School for Social Research. 

0. On the Selection of a Sample in Repeated Steps. 

Dr. W. G. Madow, Bureau of the Census. 

7. On Optimum Estimates for Stratified Samples. 

Mr. Morris H. Hansen and Mr. William N. Hurwitz, Bureau of the Census. Presented 
by Margaret Gurney. 

8. Pearsonian Correlation Coefficients Associated With Least Squares Theory (Presented 
by Title). 

Professor P. S. Dwyer, University of Michigan. 

The afternoon session concluded with the report of the Committee on the 
Teaching of Statistics which was presented by Professor Harold Hotelling of 
Columbia University. 

P. S. Dwyer, 
Secretary 



ON THE NORMAL APPROXIMATION TO THE BINOMIAL 
DISTRIBUTION 

By W. Feller 
ComeU University 

1. Although the problem of an efficient estimation of the error in the normal 
approximation to the binomial distribution is classical, the many papers which 
are still being written on the subject show that not all pertinent questions have 
found a satisfactory solution. Let for a fixed n and 0<p<l,g«l — p 

<*> 

For reasons of tradition (and, apparently, only for such reasons) one sets 

(2) Zk = (k — np)<T\ a = (npq ) 1/2 , 

and compares (1) with 

(3) N k = (2 r)~ m <T l e ~*' k/2 and IIx,„ = 

respectively, 1 where 4>(z) stands for the normalized error function. Many 
estimates are available for the maximum of the difference | P\, v — IIx,* | for all X, v. 
Now this error is 0{a~ l ) and even a precise appraisal will break down in the two 
most interesting cases: if a is small, or if X and v are large as compared to <r. 
Indeed, even for moderately large values of k (such as are usually considered) 
the contribution of T k to the sum in (1) will be considerably smaller than <r~ l 
so that any estimate of the form 0(a~ l ) leaves us without guidance. With some 
modifications this remains true also for more refined estimates like Uspensky’s 
remarkable result 2 

(4) P\ t r =* Hx,„ + 0^2 tt ) lli ^ ‘ ^ I W 

with 

| w | < {.13 + .18 | p — q |)<r 2 + e 1912 

provided o > 5. What is really needed in many applications is an estimate of 
the relative error, but this seems difficult to obtain. 

It should also be noticed that the accuracy of the normal approximation to the 
binomial is by no means quite as good as many texts would make appear. Exam- 

1 Very often the limits z\ and z p instead of z P 4- ~~ and z\ — are used. This naturally 

2a 2a 

results in an unnecessary systematic undervaluation. 

* Uspensky [3], p. 129. A two-term development of T r with an error of 0(v”*) valid for 
| * | < 2, a > 3 has been given by Mirimanoff and Dova* [1927]. 
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pies using p — \ and intervals which are symmetric with respect to np are hardly 
conclusive, since there the main error term drops out and systematic positive 
and negative errors cancel. Again, in practice comparatively small <r and com¬ 
paratively large v are frequently used. It works well to compare a P\, r of a 
numerical value, say, .93 with a corresponding value IIx,, of, say, .95. In class¬ 
room discussions the error may seem insignificant. However, in most actual 
applications one would consider the complementary probabilities, and the very 
same figures mean an approximation .05 to the correct value .07. If a confidence 
limit is set to the five per cent level, the normal approximation would in our 
example mean that two out of seven critical cases are missed. Consider next the 
example p = n = 10,000. For values of k around 1120 the relative error of 
N k is about .30; it increases rapidly with increasing k. Around k = 1150 the 
relative error exceeds 2/3, around 1180 it is nearly 1.4. And yet this example 
is conservative in comparison with many cases where the normal approximation 
is used in practice. 

It is surprising that the classical norming (2) is generally accepted although 
there does not seem to exist any deeper reason for it. The use of moments, 
though Usually very convenient, does not necessarily lead to best results. For 
example, the density function 

(5) /.<*) = ix’T 

\& the (ft + l)-fold convolution of f 0 {x) with itself and therefore, for large n, 
of nearly normal “type.” The conventional norming would approximate 
f n (x) by {2t (n + l)}~ l/2 e~ lx “ <n+1)1 * /2(n+1) , while the use of the norming factor n 
instead of (n + 1) seems clearly indicated. 

Actually, as will be seen, it is natural (at least for small values of k — np) 
to replace (2) by 

(6) = {k + \ — (ft + l)p}<r \ 

and accordingly to approximate Px,„ by the error integral taken between the limits 

(7) {X — (n + l)p}<T l and {v + 1 — (ft + l)p)a _1 . 

For example, let p = n = 500, X = 50, v = 55. The correct value is P w ,&5 « 
.317573; the norming (2) leads to IIso.bb ~ .32357, while the more natural limits 
(6) lead to an approximation .31989. More important are the quite unexpected 
simplifications which the norming (6) permits when one studies the error for 
large x k or small <r. 

We are now led to reformulate the problem: instead of starting with arbitrary 
limits for the error integral and to estimate the resulting error , we shall try to determine 
the limits so as to minimize the error. Theoretically, for any given X, v these limits 
could be determined so as to give an exact value for P \ tV . However, such limits 
would depend in the most intricate way on X and v. For practical purposes one 
would restrict the considerations to certain simple functions such as polynomials. 
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We stalls here' consider only the case where the limits are at most quadratic 
polynomials ^ Essentially our problem seems that treated by Serge Bernstein 
(and) apparently, only by him). In a series of papers since 1924,S. Bernstein 
has - considered the accuracy of the normal approximation. Quite .recently* 
he has, by a considerable computational effort, extended the range of validity 
from npq > 365 to npq > 62.5 and proved the following 
Theorem (S . Bernstein ): Let 

(8) npq >62.5 

and let a x , be the solutions of the quadratic equations 

x | np — a x {npq) m + q a x 

(9) 

x + i - np = 0 x (npq) m + --g-? Pi ■ 

If 

(10) a > 0, 0 < 2 v \npq) m 

then 

(11) *(« ~ *(ft) ^ Px,, < *(«,) - *(<*). 

The conditions (10) are practically equivalent to 

(12) X > np + l v <np + 2 l, V'*. 

The remarkable feature of this excellent result is that the error remains 0(<r -1 ) 
throughout an interval which increases with a (instead of the conventional uni¬ 
formly bounded intervals). 

In the sequel it will be show n that startling simplifications can be obtained if 
the norming (6) is used from the beginning instead of (2). Our main result is an 
improvement of S. Bernstein’s theorem. The condition (8) will be replaced by 
(n + 1 )pq > 9. The first condition in (10) will be relaxed to k > (n + 1 )p, that 
is to say, our theorem will hold for all k exceeding the central value (for. those less 
than the central value an analogous theorem holds); in the other condition (10), 
the numerical value 2 1/2 will be replaced by an arbitrary constant. Instead of 
quadratic equations, we shall consider quadratic polynomials. And finally, the 
gap between the tw o sets of limits will be reduced. 

It will be seen that the computations leading to this improvement are almost 
negligible in comparison with S. Bernstein’s deeper method; with slightly more 
sophisticated arguments and numerical evaluations, our results can be con¬ 
siderably improved. Our consideration will be based on a new expression for 
Tk , in which only exponential terms appear but the usual square root is missing. 

8 8. Bernstein [1], the first paper of the series appears to have appeared in Ucenye Zapiski, 
Kiev, 1924. 
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In passing from approximations to Tu to approximations to P\, w one has to 
replace sums by integrals. This procedure is cumbersome if an estimate of the 
relative error is desired. Euler’s formula and other standard formulas are of 
little use. We shall therefore start with a lemma which, it is hoped, may be 
useful in this connection; it will therefore be proved in a slightly more general 
form than actually required for the present paper. 

2. Lemma 4 1. For 0 < h < $ and | xh | < 1 


r x+h/t 

[ e^du = &r i ' a+( * i - m,/ i 4+ "* 4 

Jx—h/2 


-JL < w < JL. 

880 ~ “ 285 


Proof. Denote the integral in (13) by J. Then 


bT l e x ' n J = h- 1 £ 


e ~xt~t*n dt 


*hl 2 

= 2h~ l chxte~ t>n dt. 
Jo 


We begin by showing that for 0 < a < \ 

(16) < cha < 

In fact 

«7, i (• + i + + (9 i 1 + i +1' $ >- 

and 

m 4 + f + tX 1 - b) * 1+ i + «rrr * 


It follows from (15) and (16) that 


hT'e’^J > 2 hT 


e tx*-X)t*it--x*t*/bb ^(x«-1X«/3-4* 4 <</66 ^ 


Jx*-l)t*lt-x*t*lbb j\ l X 2 — l 4 2 4 X A t K 

e r + _ 3~ f ~w 


/»n/z 

> 2/f 1 / 

Jo 

_ g/i” 1 [^ e (af *~ 1)<2/6 “ ar4<4/65 ]* /2 


which proves one part of the lemma. 

To obtain an upper estimate we make use of the inequalities 


„(sc*-l)tV3 


4 +¥) 


-<t/8-hp4|4/i8 


4 The fraction i is chosen quite arbitrarily; if h be restricted to 0 < h < 1 the first member 

of (14) remains unchanged, while the fraction — on the right side has to be replaced by —. 

285 264 
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(30) < (l + - 0e‘ 4 “ ,u • ^ 

1 ~3 

c x4<4/l8+A4/2W 

Using (16) and (20), the proof of the second part of the lemma follows from a 
computation analogous to (19). 

For our purposes it is convenient to use Stirling’s formula in a form which is 
not quite the usual one. 

Lemma 2. {Stirling's formulas). For n > 4, 

^ 21 ) n t s (2 t )\n + J) n +^ c -^ n +*)- 1 / 24 < n +i)+( 7 / 2 8 80 )< 1 ■^l)/( n -H) , 


4 + 


x 2 - 1 


or 

(22) nl = (2ir) i n n+l e"' n+1/l2n “ a+at)/,a0n * 
where 

(23) | di | < £, t? —► 0 as n —> oo. 

Formula (21) can be derived from the gamma function or in any other way 
that leads to the standard form (22) . B 
3. From now on we shall put 

(24) or 2 = (n + 1 )pq 

(25) Xk = \k + £ — {n + l)p)<r *; 


the subscript k will l>e omitted whenever no confusion is to be feared. To trans¬ 
form T h we shall use (21) for the factorials in the denominator, but (22) for 
(n -f* 1)! in the numerator. 


6 A simple proof runs as follows. Put B n * n!(n + Then 

1 ?*z! __ V Z 1 _\ — » L l±J} 

° g B p \6 2v{2v + 1)) (2p)*' * 60 (2p) 4 

with 0<5i<~jifp^5. From here (21) follows using the fact that 
T log - log B„ - J log (2x) 

p-„-fl 

and that for n ifi: 4 


1 - a * 1 1 

3(n + J)» < ^5 < 3(n + *)* 


with 0 < 3 < rr. In this Way the estimate (23) can be considerably improved. 
25 
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Then 


log ((2T) j <r7\) = (n + 1) log (» + 1) - (fc + J) log — ti 

V 


(26) 


(27) 


(n - * + i) log--+ 


1 


+ 

+ 


12(n +D 24 (fc + i) 

1 

24(« - k +T) “ P 


(, + ?W> + f)-7('-=W-=) 


+ .W + 
12^ t 


24^(l+f) + 24. 2 (l-^) ' 

0 < ^ 7 / i 7 r i i i 

- P ~ 6 \360(n + l) 5 ^ 2880 |_(A* + §)* (n - fc + D’J 


1 I 


- (i'360a«\ P 9 + £ (P + 


provided only that k > 4, (n — k) >4. Asymptotically p is equivalent to the 
right-hand member without factor J (which, by the way, could be replaced by 
1 + A)* Obviously 

<*> 0 < ' < sm?’ 


if k > 4, n — k > 4. We shall consider later on the case a > 3, 1*1 < §*; 
then clearly A; > 4, n — A; > 4, so that the use of (28) will be justified. Expand¬ 
ing (26) into a power series we obtain 
Theorem. If k > 4, n — k > 4, 


T k = (2rr l cr l exp ^ £ 


(29) 


+ 24a 2 ? * P 


(- «r‘ iL 

T 1 " 

ij 2 + 1+2p? 


-1) 

(- ?)' 


24 <r 2 


— P 


where p satisfies (28) (and (27)); x and a are defined by (25) and (2t), respectively . 

Each term of the second series will usually be small as compared to the cor¬ 
responding term of the first series; the second series can therefore, if desired, be 
absorbed in the error term. If x is small the first term of the first series will be 
preponderant. However, as x increases, more and more terms will make them¬ 
selves noticeable; if x ~ cr 3/2 , three terms will be essential, and so on. 

Formula (29) permits us to approximate P\ tV by means of integrals. The 
tangent rule would suggest to compare to 

# (*+s)-•(*-£) 


(30) 
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and (29) together with lemma (1) permits easily to estimate the relative error 
in the practically most important cases. It is also seen that the limits in (30) 
are essentially the only limits depending linearly on X and v which will render the 
relative error 0(<r -1 ) for x = 0(1). Instead of elaborating on these simple 
questions we proceed to the more intricate problem of limits which are quadratic 
polynomials in X and v. 

4. For brevity we shall from now on put 


(31) 



The estimate | a | ^ & will be used constantly. It obviously suffices to consider 
values of X < v which exceed the central value [(n + 1 )p]. 

Theorem. Suppose that 

(32) er > 3 

and 


(33) X > ( n + l)p v + £ < (n + 1 )p + |<r 2 . 
Then 

(34) Px„ < e (*(*+,) - *(*)), 


if 

(35) 


Ik 


k 


(n + 1 )p + a j k — (n + \)p 
a a\ a 


2a __ 1 
a 2a 2 ’ 


while the inequality in (34) is reversed if 


(36) ik- 
where 

(37) 


k - 


(n +J )p + a J k - ( n + l)p 

c t a\ ar 


+ 


2o M JL 

a ' 6<r la 


£? = [v + £ - (n + 1 )p \* 
a a 4 


The gap between the limits (35) and (36) is 0( a" 1 ) if xl = 0(a). In S. Bern¬ 
stein^ case (12), M < \/2 and the gap is about 2/(5<r). It will be seen from 
the proof that it requires only routine computations to improve the correction 

term + ^j<T l in (36). 

Proof. Put 


(38) 


. i a> 2 

£* = *k H— x k , 
a 


again suppressing the subscripts wherever convenient. As a consequence of 
(33), we shall be concerned only with values x k satisfying 


1 , . 2 

2i <X< S 


(39) 
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Consider first the main series in (29) and write 


v- ?r l - (- gy 

t «'(•' -1) 


“if* + A, 


where 


4 -(P* + Q 9 _<*V , V' P " 1 ~(~qT 

V 12 2/<r 4 5 v(v — 1) 


We shall require some estimates of A. First consider the case a > 0. Then all 
terms of the series are positive, while the expression within parentheses assumes 
its minimum A for P * By (39) £ < x, whence 

(42) A> Ji? if ° > °- 

If a < 0 the signs in the series (41) alternate, each negative term being smaller 
in absolute value than the preceding positive term. Therefore, using (39), 


The expression within braces is a cubic in p which assumes its minimum for p = 
(1 + \/793)/72 = .405. ... It follows that 

<«> if “ <0 

(half of this estimate would actually suffice for our purposes). On the other 
hand, it is evident from (41) that the ratio A /x A attains its maximum for p — 1. 
Therefore, using (39) 

<«> a< tS 


Next we write 


24<r » 


£ Ip *- 1 - (-qr 1 ) 


(sr-v 


i + B, 


whence 


(47) 

A trivial computation analogous to (43) shows that B > 0. Again, if a < 0, 
the signs in the series (47) alternate and in this case 

(48) 2j^~ 144^ <20^* 
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If a > 0 we eon majorize (47) by a geometric series mid obtain 


oss *5^s£- 


Now put 


A£* » <j 




( 51 ) £* + iA£* = £4+1 — iA£fc + i 

so that the intervals with endpoints £4 db £A£ fc are non-overlapping and con¬ 
tiguous. Clearly 


A£ = <f 


l t 1 + T { 


Introducing (40), (46), and (52) into (29) we obtain 


T h - (2r) ' A£-exp< - A + B +£- £ - *log 
l a 4 <r 


0+?) 


To appraise the logarithmic term we write 


»"*(> +t)- t- c - 


C £ 2 attains its maximum value when a = — J, and it is readily seen that 
0 < C < if a > 0 

a 2 

o < C < if a < 0. 

U 

Finally we put, with a parameter u to be determined, 


y = £ + 


2a — u 


Ay = A£. 


If one puts 


2tr 4^ 


and 7/4 is defined by (35), then 

(58) yn + £Aj/* = t/4+1, yk — iAj/fc = i/* . 
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On the other hand, if 
(59) 


M _ 1 _ a 
6 7 I? 


and is defined by (36), the identities (58) hold again. Accordingly, all we have 
to show is that, with u defined by (57), 

(60) T k < \<Hy k + \Ay k ) - *(y k - ^)\e M ~ Pq)n ^ 

and that the inequality in (60) is reversed if u is defined by (59). 

Elementary transformations lead from (53) to 

(61) T k - (2r)-» Arexpj—| J + (y 2 - 1) + 

where 


(62) E 


u — 4 an 
2<t* 




— A B C — 


p- 


Let now u be defined by (57). In view of lemma l and (61), the inequality 
(60) will be proved if we show that 


(63) 

Now clearly 

(64) 


E, 


£ + T5T £0 - 


y t /] , 4aA _ y*{Ay? ^ y*{AyY 
24<r 2 \ <r / ' 24 - 880 ’ 


Moreover, introducing the estimates (28), (32), (42), (44), (48), (49), and (55) 
into (62) it is seen that for a > 0 


(65) 

and for o < 0 

( 66 ) 






The derivatives of the right-hand members in (65) and (66) are both negative 
for £ > 0. Now we are interested only in values x Satisfying (39). For such 
107 107 


values £ > 


216(r’ 


For £ 


216<r 


the nght-hand members in (65) and (66) are 


negative, so that E x < 0 for x > — . This proves the first part of our theorem. 

The proof that with (59) the inequality in (60) is reversed proceeds on similar 
lines. We have to show that 


Ei — E 


(Ay) 4 
285 


> 0 . 


(67) 
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Suppose that a <J), which is the less favorable case. Then, by (48), (37), and 
(39), 


( 68 ) 

Similarly 

(69) . 


15 ff 20 a 




Using (62) we have therefore, neglecting the non-negative terms B and C, 


(70) 


Et >£-- 


2a 1 24<r< 


u 

3o* 


1 

250O 4 



_ 5 _ _ 8 M ju\ 
72a 3 20<r + 12o*J 


1 

24a 2 



The expression at the right side represents a parabola, and it suffices to show that 
it assumes positive values at the endpoints of our interval (39). Now 


(71) 



and simple arithmetic shows that, with (59) the expression within the braces 
more than counterbalances the negative terms outside. 6 If a > 0 the situation 
is more favorable and the estimate (59) can then be further improved. 
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THE VARIANCE OF THE MEASURE OF A TWO-DIMENSIONAL RANDOM 

SET 


By J. Bronowski and J. Neyman 
Princes Risborough , England and the University of California 

1. Introduction. In a recent paper H. E. Robbins 1 has solved the problem 
of the variance of the measure of a one-dimensional random set. The present 
paper treats a similar problem relating to a two-dimensional random set under 
somewhat more general conditions. 

Let R denote a rectangle of dimensions a X b whose position is fixed. Let R f 
denote another fixed rectangle concentric with R, its sides a + 7 and b + y (where 
7 > 0) being parallel to the sides a and b respectively of R . Finally, let p denote 
a rectangle of fixed dimensions but variable position, whose sides a < 2y and 
£ < 2y are parallel to a and b respectively, but the position of whose center will 
be considered as random. In fact it will be assumed that the rectangle p is 
dropped on the plane of R in a manner which satisfies the following two 
assumptions: 

(i) The probability that the center of p falls within R' exactly s times has a 
defined value P, for each s = 0, 1, 2, • • • Thus, if ^(?j) denotes the probability 
generating function of s, so that 

a) *(«) = f: w p,, 

a -0 


then ^(u) is assumed known but will be left arbitrary till the general result is 
obtained. 

(ii) Whenever a fixed number s of centers of p fall within R ', it will be assumed 
that the probability that exactly k centers of p fall within any chosen sub-area w 
contained in R' is given by the binomial expression 


( 2 ) 


k!(s - k)\R 


w k ( wA* 
!^V R'J 


Under the above conditions, denote by E the set of all those points of R which 
are covered at least once by the rectangle p during the course of the trials con¬ 
sidered. Let X denote the measure of E . The purpose of this paper is to 
evaluate the first two moments of X. 

First, the computations will be made for the case when s is fixed, i.e. wiien 
(3) ty(u) = u\ 


The values of the two moments of X computed for fixed 8 will be denoted by 
and ilf 2 (a,&|s). Next, the moments of X will be evaluated for an 
arbitrary generating function ^(u), and these will be denoted by M x (a,b) and 
M 2 (a, 6). 


1 H, E. Bobbins, “On the measure of a random set", Annals of Math. Stal., Vol. 15 
(1944), pp. 70-74. 
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H. E. Robbins has found the first moment 

(4) M\(a, b |«) = - (l - 

Also, for a one-dimensional set, he has obtained the second moment, say Af*(a|s),. 
when a < a. 

It follows immediately from (4) and (1) that, whatever be the probability 
generating function ^(u), 

(5) M l (a,b) - olxjl -*(l 

In particular, if the probabilities P, are those of Poisson when the density of 
positions of the center of p per unit of area is A, so that 

(6) ¥(u) - e XR, ^\ 
then 

(7) b) = afcfl - e' a/sx | 

Our remaining problem, therefore, is that of evaluating the second moment of 
X. Instead we shall evaluate the second moment of 

(8) Y - ab - X, 

and shall denote it by m(a, b | s) or m(a, b) according as 8 is or is not considered 
to be fixed. 


2. Derivative of the second moment of Y. In order to evaluate m(a, b), we 
begin by calculating its second (mixed) derivative, say D(a, 6 | s), where 


— lim ““k {m(a + Aa, b + Ab | s) — w(o, b + Ab | «) 
(9) Aa,A6—0 


— m(a + Aa, b | s) + ra(a, b | s)} 


= hm /(Aa, A6) (say), 

where A a and Ab are the increments of a and b respectively. Once Z)(a, b | s) 
is found, the formula for m(a, b | s) will be obtained by two quadratures. For 
definiteness we shall assume Aa and Ab both to be positive, but of course the 
argument which follows applies equally to other cases. 

Consider the rectangle of dimensions (a + Aa) and (b + Ab) as shown in Figure 
1, and denote by U, V and W the measures of the ‘'uncovered” parts of the three 
rectangles A a X b, a X Ab, and A a X Ab respectively. That is to say, U, V 
and W are defined with respect to these three rectangles precisely in the jsame 
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manna* in which Y is defined with respect to the original rectangle a X b sa 
R . Using the letter E to denote the expectation, we easily find that 

J(Aa, A b) = 2tf(yjF) + 2B(17F) 

(10) + 2E{VW) + 2E(VW ) + 

However, each of the three expectations in the second line of formula (10) is 
infinitesimal of an order higher than the product AaAb. In fact, none of the 



Figure 1. 


variables U, V and W can exceed the area of the rectangle of which it forms part : 
that is, 

0 < U < b Aa, 

(11) 0 < V < aA6, 

0 < W < AaAb. 

It follows that 

0 < E(UW) < b(Aa)~Ab, 

(12) 0 < E(VW) < aAa(Ab) 2 , 

0 < E(W 2 ) < (AaAb) 2 . 

Hence, from (9), (10) and (12) 

(13) D(a, b |«) - 2Um ^ \E(YW) + E(UV)}. 

We now reduce the calculation of (13) to finite form by approximating to the 
infinite sets F, C7, V, W by progressively more ample but finite sets. To do so, 
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we cover R f by progressively more ample but finite networks of points. More 
precisely: consider a rectangular system of axes 0$ and Or) oriented as in Figure 1 
so that the axes are common boundaries of a X b as R and of the rectangles ob¬ 
tained by increasing a and b. Let 

(14) d n = a/(n + 1), 5 n * b/(n + 1). 


Consider the lattice of points (ij) with coordinates 

(15) ([ n) = id n , v j n) = jS„ 

for i = -vi n \ -t/i n) + 1, • • • , 0, 1, 2, • • • , n; j - -t4 n> , -t4 B) + 1, • • • , 
0,1, 2, • • • , n, where Vi" ! and i>j B) are the greatest integers such that 

(16) v^dn < Act 


and 

(17) t4 n) $n < A b. 

To simplify the writing, the superscripts ( n ) will henceforth be dropped. 

With every point (ij) we associate a random variable defined as follows. 
If in the course of the trials contemplated none of the rectangles p covers (ij), 
then Xu = 1. Otherwise Xij = 0. Further, write 

Y n “ d„5„ x u 9 

i-0 0 


(18) 


u n = £,$» £ £*„•, 

—»i 0 

n 0 

Fn = 5 n ^ ^ ^ ) 

t—0 J~-V2 

w n = d„s„ £ £ Xij .* 


. Now the boundary of the set E, for a fixed s, consists of one or more polygons 
having a finite total number of sides each of bounded length. It follows that, 
given any t > 0, there exists, for a fixed s, a number N t (s) such that n > N t (s) 
implies that 


(19) 


\Y n -Y\<c, 


with similar inequalities relating to U n , V n and W n . Hence it follows imme¬ 
diately that 


lim E(YnWn\8) « E(YW\s\ 

n-+ oo 

lim E(UnVn\s) - E(UV\s), 


(20) 
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The expectations in formula (13) will therefore be obtained as limits of those 
on the left hand sides of (20). We have 

( 2 1) E(Y n w n |«) = 4 4 E E E( XiJ E £ I «V 

(22) E(u n v n |s) - 44 ED 

(o-t l j “O 

Hitherto we have made no assumptions concerning the values of A a and A 6 . 
Since these are to tend to zero, we may assume that 

0 < Aa < y — a/2, 

(23) 

0 < Ab < 7 - 0/2. 

On this assumption, we shall now compute the expectations of the type 
E(xijXki | s), of which ( 21 ) and ( 22 ) are linear combinations. 

Since the variables x i3 - and x k i are capable only of the two values unity and 
zero, the expectation of their product is simply the probability that both of them 
are equal to unity, i.e. the probability that both points (ij) and (kl) are “missed” 
by all the 8 rectangles p falling on R'. This probability may have one of two 
forms. If both 

(24) d„ | i - k | < a and 8 n \ j - /1 <0, 
then 

(25) E(x ti x kl I s) = jl - > (<9 ~ *» . !/ ~ l -\ 

while otherwise 

(26) E{ Xii x kl | a) = (l - ^r) ; 

in each case, in virtue of the assumption (ii) of Section 1 . 

The essential content of equations (24) to (26) is that, once the other variables 
appearing in them are assigned, E{x,jX k i | s) is a function only of the differences 
i — h and j — l. It is this fact which allows us to evaluate the limits of the 
quantities in ( 21 ) and ( 22 ) in a simple manner, in effect by holding one of the 
two freely variable points (ij), (kl) in a fixed position, say at the origin. Thus, 
let 

(27) E(e n 1 8) = dl si E E 4 Xu E E Xu I *). 

Owing to the remark just made, the expectation 

( n+i n+i \ / n n \ 

x u E E I *) = E E I *) 

k-i l-i / \ MM / 



(28) 



TWO-DIMENSIONAL RANDOM SET 

and it follows that 

E($ n | *) * (l>l + 1) (V% + 1) d» fin E^Zqo 23 |E3 I 


(29) 


[(t>l + I) (fit + 1) d« fin] ^dnfinS 23 EfjXwXu | *)J. 


Of the two factors in the square brackets in (29), the first tends to Act Ah as n 
tends to infinity, and the second tends to the integral 

(30) jf £ fit, v)d£d v 


where 

(31) 


/({,*) = ! - 


2a/3 — (a — {) (0 — n) 
R' 


if both 0 < f < a and 0 < n < 0, and 

(32) /({,„) (l - 

otherwise. Thus the computation of the limit of E(B n | s) is straightforward. 
It remains to show that it differs from that of E(Y n W n | «) in equation (21) by 
an infinitesimal which is of an order higher than the product AaA b. 

Since the variables x a are capable only of the two values unity and zero the 
absolute value of the difference between the brackets in (21) and (27), that is, 
between 

n n n+i n+ j 

(33) x iS E £ x ki and x iS £ £ x », 

*—o T—o fc—• i—y 

cannot be greater than — n(i + j) < n(v\ + y 2 ). It follows that 

(34) | EiYnWn I «) - E($n | s) | < [d n S n (v 1 + i)(V2 + l)]NnMn+ ndnVlfin]. 
As n tends to infinity, the right hand side of (34) tends to the product 


(35) 


AaAb[bAa + aAb]; 


whence 


(36) 


lim E(6 n \ s)} = lim -L*E(YW\s) 

ia,A6-*0 aaao n—o Aa,A6->0 &CL HO 

- f ff‘(t,v)dSd n . 


A very similar procedure will serve to evaluate the limit of E(UV | s)/AaAb. 
Here, we replace the two freely variable points (ij), (kl) by two semi-fixed points, 
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one being restricted to the axis Of and the other to the axis O 17 . More precisely, 
instead of considering E(U n V n | s) in equation (22) we consider, say, 

(37) Eton | a) - dl 6* E E *«) 

and it is easy to see that 

(38) . lim | E(U n V n \s) - E{t> n | 8 ) | < &(Ao ) 1 (Ab), 

n—»oo 

so that the quantity (37) may be used in equations (13) and (20) in place of the 
quantity (22). However, since E(xijXki | s) depends only on the differences 
i — k mdj — l , 

(39) E(x iS E E x„)=E[x » 3 -E E *w) 

*-* Z--t> 2 \ Jfc-0 Z~-* a / 

and therefore 

(40) £(*„ I a) = {£,(», + 1)1 (<U 2 . E e(x 0 / E E **»l 

Further, and in the same way, we may replace the sum in (40), namely 

(41) E e(xoj E E Xti I a ) = E E E I «) 

by the simpler sum 

E E E\ Xki E I«) “ («>* + 1) E **o E x 0 j | s ) 

fc—0 Z— —«2 \ 7 —z / fc —0 \ J—0 / 


(42) 


(*>2 + 1 iC ® ( X *0 # 0 / | «). 

A:««0 }«0 


It follows that we may replace the limit of E(U n V„ | a) as expressed in (22) by 
(43) lim {d„ (vi + 1 ) i„ (t > 2 + 1 ) j {d„«„ E E E{x m x oj | a)}, 

n-*«o l. fc-0 ;-0 J 


and this is easily found to be equal to 

r f 

Jo Jo 


(44) 


AaAb f [f(i, v) d£dv, 
Jo Jo 


where/(f, v) is defined by the formulae (31) and (32). 

Collecting this result with that expressed by (36), and substituting in equation 
(13), we therefore have finally 


(46)’ 


D(a,b | 8 ) - 4 f f f(t, v)dtd v . 

JO Jo 
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3. The forms of the derivative. Since the function /(£,v ) has two different 
forms (31) or (32) depending on the relationships between a, b, a and 0, it will 
be necessary to distinguish four different forms of the derivative (45), and of its 
integral. 

First, for values of a and b for which simultaneously 


(46) 


a < a and b < 0,* 


the integrand in (45) has the form (31) for the whole region of integration. 
Hence the value of D(a, b | $) in the region (46) is given by, say 


(47) 

*-*rrc 

where 

(48) 

= 4 / / 

•'a—a ft 

»(<, t) ■ 


2a/3 - (a — {) (0 — v) 
R' 


)• 


d$dij 


2 a(3 ~ tr 
—fir— • 


Next, when a ^ a but b < (3, the integrand in (45) has the form determined 
by (31) only when 

(49) 0 < £ < a, 0 < r? < 6 , 

whereas when 

(50) a < £ < a, 0 <v<b, 

the appropriate form is that determined by (32). Therefore here D(a, b | s) 
has the form, say, 

(51) D 2 = 4 b(a - «)(l - + 4 j J*_" g\t, r) dtdr, 

Similarly, for 

(52) a<a but lift 
Z>(o, b | s) is given by, say, 

(53) Da = 4o(6 - 0)(l - + 4 j [ 9 g\t, r) dtdr. 

Finally, in the region in which simultaneously 

(54) a ^ a and b ^ 0 , 

Z>(o, b | s ) has the form, say, 

(55) Di = 4(o5 - « 0 )(l - + 4 jf (?*«, r) dtdr. 
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4, The second moment of K. We have now to determine m(a , b | $) for all 
non-negative values of a and b, from the equation 

«W 

The general solution of this equation is 

(57) m(a, b | s) = £ J* D(a, b \ s ) dadb + A (a) + B(b), 

where A (a) and B(b) are each functions of one variable. These functions are 
determined by the boundary conditions, namely 

(58) «(«, 01.) - «(0, b | •) = = 0 , 

da do 

which are a consequence of the inequality 0 < Y < ab. It is then easily found 
that the only solution m(a, b j s) satisfying (57) and (58) has the following four 
different forms, depending on the values of a and b. 

If a < a and ‘6 < 0 , then 

(59) m(a, b | s) = jf jf Di(x , y) dxdy = m i(a, b | s) (say). 

If a S a and b < p, then 

m(a, b | s) = mi (a, b\s) + f f D 2 (x, y) dxdy 

*a Jo 


m 2 (a , b | s) (say). 


If a < a and b p y then 


m(o, b | s) « mi (a, p | s) + f f D 3 (x, y) dxdy 

( 61 ) 

= m 8 (a, 6|s) (say). 

Finally, if a ^ a and b ^ /?, then 

m(a, 5 | «) * mi (a, 0 | $) + J J D 2 (x f y) dxdy + ffn «(*, y) dxdy 

( 62 ) “ r r* 6 

+ / / D t (x, y) dxdy = m 4 (o, b |«) (say). 

J a Jfi 

The procedure used to evaluate the integrals (59) to (62) follows the same 
general pattern, and we shall confine ourselves to outlining it in one case, say (59). 
There 

mi (a, b | s) « jf jf Di(x, y) dxdy 

(63) - 4 £ £ dxdy f f g'(t, r) dt dr 

4 1 */„*{! T)dT }' 
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Integrating the double integral in the braces by parts for y we get, say. 


(64) 


I(t) SS f dy f g'(t, r)dr = \y f g‘(t, r) drl 

JO Jp-y L ^8—v Jo 

- Jf yg'(t, & - y) dy, 

whence, substituting fi — y = r in the last integral, 

I(t) = b f g'(t , r) dr — f (fi — r)g\t , r) dr 

(65) , 

= f (r + b — 0)0*(J, r) dr. 

Ja-b 

Proceeding now in the same manner with the other double integration in (63), 
we conclude that 


( 66 ) 


mi(a, b | s) = 4 f dx f l(t) dt = 4 f (t + a — a)I(t) dt 

Jo Ja—x J oi —a 

rd 


= 4 f dt f (t + a — a)(r + b — f})g*{t, t) dr, 
J a—a J&—b 


where, throughout, g{t> t) is defined by (48). 

Formulae for w^(a, b | s), m 3 (a, 6 | s) and vuia, b | s ) are obtained by a similar 
procedure. They may conveniently be summarized in the following single 
expression. Define a symbol [z] for any real number x by the equations 

[x] = x if x ^ 0 
(67) 

[x] — 0 if x ^ 0 . 

With this notation, whatever be the relation between a, b , a and 0 , we have 


m(a, & «) = 4 f f (t + a — a)(r + b — 0)/l-^7-} dtdr 

J f«-«] Jw-b] { R ) 

(68) , v 

+ fa 2 [b - 0] 2 + b 2 [a - a ] 2 - [a - af [6 - 0 ] 2 )(l - ^f) . 

We now allow s to take all values 8 = 0 , 1 , 2 , • • • with probabilities P, given 
by the generating function ( 1 ). Then it follows, from the form of ( 68 ), that 

(O, b) - 4 f f (t + a - a)(r + b - 0)*(l - 2c ^~t)dtdT 

+ {a [6 - 0] 2 + & 2 [o - a ] 2 - [a - «f [6 - / 8 ]*}*(l - 

On subtracting from this the square of the first moment of Y, which by (5) 
and ( 8 ) is 


m 


(69) 
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we obtain the variance <r* of Y. But the variance of Y is necessarily equal to the 
variance <r£ of X . 

5. Particular cases, (i) * u\ This is the case, considered originally, 

in which the number s of centers of the rectangles p falling within R' is fixed. 
The explicit evaluation of the variance a 2 x depends in this case on the evaluation 
of the integral 

m L, L + * - *«' +6 -»{(i - ¥)■ + &}' 

The evaluation is easy if one expands the binomial under the sign of the integral 
and integrates term by term. Each such integral is a product of two simple 
integrals. 

(ii) ^ 2 (u) = Poisson Case. This is the case where the probabilities 

P $ that there are exactly s centers of rectangles p within R f are given by the 
Poisson Law, P t = (\R'Ye~ XR ’/si. Substituting the expression of the probability 
generating function into (69), we obtain for this case 

m(a, b) = 4e~ 2afiX f [* (t + a - «)(t + b P) X! dtdr 

( 71 ) J [«—o] si 

+ + b> - a? - [a - af\b - ll2 ' 

On performing the integration term by term, and contracting the first term 
of the resulting infinite series into the second line of equation (71), we readily 
obtain the result 


(72) 


m(a, b) = 4e 


2 <*0x v Quiff)* 
SA si 


Gift 

(s + ms + 2 ) 2 


X 

X 


{(. + 2)0 - a + [a - a](l 

|(« + 2)b - P + [p - b](l - ^* + j + e^a'b*, 


where [x] continues to have the meaning defined by (67). In virtue of equations 
(7) and (8), however, the last term of the expression (72) is precisely the square 
of the first moment of Y when s is Poisson distributed. Hence, for s Poisson 
distributed, we have the expression for the variance of Y and of X, 


2 

cr Y 


2 


A ~2affk Y' (Xaff)* __ 

tA s\ (s + l) 2 (s + 2) 2 


X 

X 


(s ~f- 2)a — a -f* [a — o] ^1-^ 

(s + 2)b - P + [p - b](l - 0 


t+r 


»+i' 


(73) 
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(iii) — c m(eXS/(w Contagious case. This is the case where the prob¬ 

abilities P, that there are exactly s centers of rectangles p within R f are given by 
the contagious law of type A with two parameters 2 . The evaluation of the 
second moment of Y is made easy by noticing that the probability generating 
function appropriate to the contagious distribution may be expressed as 
a series in terms of the probability generating function of the Poisson Law 


(74) 


*»(w) = «"£n *»(«) 

kz o A:! 


= L ., 

kTto K ! 


m „k\R’(u-D 


Thus the evaluation of the integral intervening in the formula for the second 
moment of Y is reduced in the present case to that of formula (71). 


6. Remarks on other cases, (i) It may be of interest, in amplification of 
H. E. Robbins’ results, to exhibit the analogues of formulas (68), (69) and (73) 
in the one-dimensional case. Foi this case, then, if the interval a is embedded 
in a larger interval a', we obtain by similar methods beginning with the calcula- 

tionof 

da 


(75) m{a | s) = 2 ^ (t + a — a)^l - dt + [a - af ^1 - ^ , 

whence 

(76) m(a) = 2 (t + a — a)^ ^1 — dt + [a — a) 2 ^ ^1 — ; 


in particular, if s is Poisson distributed, 


2 2 0 -2«X V 

a x = (Ty = 2c 2-*> 


(77) 


(aX)* a 

si (s+l)(s + 2) 


X |(8 + 2)a — a + [a - o] ^1 - j. 


The close parallel between these formulas and those for two dimensions make it 
natural to conjecture analogous formulas for n dimensions; but we have not 
attempted to establish such formulas. 

(ii) For the evaluation of the higher moments of Y it may be useful to notice 
that precisely the same method as that described above leads to the conclusion 
that the derivative of the n-th non central moment of Y is 


(78) 


d 2 m n {a, b) 
dadb 


lira J nE(X H - 1 TT) + n(n - 1 )E( l X n ~ 1 UV)}. 

Aa,Ab-+0 AaAO 


* J. Neyman, “On a new class of contagious distributions”. Annals of Math. Stat., 
Vol. 10 (1939), pp. 35-87. 



ON THE MEASURE OF A RANDOM SET. H 

By H. E. Robbins 

Postgraduate School , U. S. Naval Academy 

1. Introduction. In a recent paper 1 the author derived general formulas for 
the moments of the measure of any random set X , and applied the formulas to 
find the mean and variance of a random sum of intervals on the line. In a 
subsequent paper 2 J. Bronowski and J. Neyman, using other methods, found the 
variance when X is a random sum of rectangles in the plane, and raised the 
question of finding the variance when X is a random sum of n-dimensional 
intervals in n-space. This will be done in the present paper, independently of 
the work of Bronowski and Neyman, using the methods of (I). The correspond¬ 
ing problem for circles in the plane will also be solved. 

2. n-dimensional intervals, N fixed. Let the random set X be defined as 
follows. Let Ay , ai (the range of the subscript i throughout this paper will be 
from 1 to n) and 5 be fixed positive numbers such that o, <2 5. Let R denote the 
n-dimensional interval consisting of all points (xi , * • • , x n ) such that 0 < Xi < 
Ai , and let R' denote the larger interval for which — 5 < Xi < Ai + 5 (and also 
its measure n(A t - + 2 5)). Let a fixed number N of intervals with sides a t - 
parallel to the axes be chosen independently, with the probability density func¬ 
tion for the center of each interval constant and equal to 1/R' in R'. The set X 
is the intersection of the set-theoretical sum of the N intervals with R. The set 
Y consists of those points of R that do not belong to X. We have identically 

(1) X+Y = R, 

where capital letters denote either sets or their measures. 

From (I), equation (15), we have 

(2) E(Y) = j£ ’ J 0 > • * * , Xn)dxi • • • dx n , 

where, setting r = Ila,, we have 

(3) p(x i , ■■■ ,X n ) = Pr((xi , ■■■ , Xn) € Y) = ^1 - 
Hence 

(4) E(Y) = R(l - 

1 H. E. Bobbins. “On the measure of a random set,” Annals of Math. Stat . Vol. 15 
(1944), pp. 70-74. We shall refer to this paper as (I). 

* J. Bronowski and J. Netman. “On the variance of a random set/' Annals of 
Math . Stat . Vol. 10 (1945), pp. 330-341. We Bhall refer to this paper as (BN). 
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From (1) it follows that 

(5) E(X) = fijl - (l - £)) 

From (I), equation (21), we have 

»An pA i pA n pA\ 

E ^-l • L L -1 


( 6 ) 


•dxi 


‘ ,Vn) 

dx„ dyi dy n , 


where 

(7) p(x i, — , , 2/i, — , = Pr((zi, • • • , x n )€F and (yi, * • • , y*)«F). 

It is clear from the symmetry of the problem that the distribution of Y will be 
unchanged if we assume that for all t, Xi < yi . Hence, since there are 2 n possible 
sets of n inequalities each, we can write 

( 8 ) E(Y 2 ) = 2 n jp ■ • • J* 1 £'■ • • jp p ds, ■■■ dx„dyi dy n . 

We now introduce the new variables of integration 

(9) Ui = Xi , Vi = t/< - Xi 
for which 

(10) d(Ui , • • • , Un y Vl , ‘ , Vn) __ ^ 

t • * * > #n , 2/l , • • • , 2/») 

In terms of the new variables we have 


some*, 


(11) p = /(Vi , • • • , V n ) 


Equation (8) now becomes 


j^l - if v, > a ( for 
i/, 2r — n(o ( — v t )\ K . 

H 1 - 2 »-) 


for off*. 


( 12 ) 


pA m pA\ p A n ~v n pAy-Vi 

E(Y*) = 2 n J ••• J j f dut ■ - • dundvi • • • (to* 


r A » i>A\ 

-2 */ • •• / /n(A« — »<) • • • dvn . 

Let z< = min(a,, Ai). Then from (11) and (12) we obtain 

E(Y 3 ) = 2" j[“ ..• j£“ (l - ? r T . n^-^ y n(A, - »,) dBi • • • <fe„ 

+ 2"(i jpn(4 f -««)*i...*. 

— j£* • • • j£ n(i4| — v t ) dv i • • * . 


( 13 ) 
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Let the Symbol [a;], as in (BN), be defined by 

(x if x > 0, 


(14) 


[*] 


0 if x < 0. 


In the integral in the first line of (13) we introduce the new variables of integra¬ 
tion Wi = a* — Vi , while in the two integrals in the second line we introduce 
the variables s, — A, — v,. The result is 


E(Y 3 ) = 2" r 

J [On- An) 

■ r 

•Ho,-*,] 

(■ - * -/”•)" 




■n(u) t + Ai — a.) dwi 

■ • • dw n 

(15) 

+ 2 

'-l 

•o-inr'f— 

■ An pA j > 

• ■ • / n«< ctoi • • • > 

n~ a n1 J(Ai —<Oil J 

■ ■ ■ ds n 

= 2 - r . 

.. r 

(, 2r — nw,\" 


l a n~“A n ] 

J[ai~Ai) 

V “ R' ) 




•II(w\ + Ai — a,) 

■ ■ ■ dw n 


+ ( 

1 _ wf !lL4 < “ n( ‘ 4 ‘ “ [ - 4< ■ 

a,] 2 )j. 

From (1) we see that — 

and (5) we have 

E(X 2 ) - 

E\X) = E{Y 3 ) - E\Y). Thus from (4) 

f a n 

4 = 2" / 

J [<»n-A nl 

r (■ 

2r — IIwA* 

“ W~) 



• U(iVi + Ai — a { ) dwi • • • dw n 



3. n-dimensional intervals, N variable. Now let X and Y be defined as before 
except that the number N is taken as a random variable, capable of assuming the 
Values 0, 1, • • • with respective probabilities Po, Pi , , and with generating 

function 

= E V* t N • 

0 


( 17 ) 
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Then from (5) we have 

(18) 200 - % - (l - £,)"} - «{l - ,(l - £,)}. 

while from (15) we have 


a\ = E(Y 2 ) 


*<?) = 2" f" ■■■ p * (l - - nw< ) 


(19) n(w, + — a<) dni dw H 

+ * (l - §i) {TlAl - n(Aj - [A< - a,] 2 )} - (l - 

In particular, suppose that, as in (BN), N has a Poisson distribution with a 
parameter X, 


( 20 ) 


Vs 


(\R') N 

N » 1 


so that 

( 21 ) 

Then (18) becomes 

( 22 ) 

while (19) becomes 


2 

ox 


( 28 ) 


-2 -f {£ 


*<<) = 

JS(X) = fl{l - e~ Xr ), 

(xnw,)" 


N\ 


{+ Ai — a*)} dwi • • ■ dw n 

+ e _2Xr (IL4 2 - n(i4 • - [4, - a.] 2 )} - R 2 e _2Xr , 


Integrating term by term and simplifying the resulting expression, we obtain 
finally 


ox 


= r • 2 n • c“ 


(24) 


?U 


(Xr) A 


|(X + 1 )(N + 2))“ 
nj(AT + 2)A< - a. + [a.- - /I.] (l - 


4. Circles in the plane. Let the random set X be defined as follows. Let 
Ai , At , a, and 5 be fixed positive numbers such that 2a < min ( Ai , At , 26), 
Let R denote the rectangle consisting of all points (xi , Xt) such that 0 < Xi < At, 
0 < x 2 < At , and let R f denote the larger rectangle for which — 6 < X\ < A\ + 6, 
< X 2 < A 2 + 6. Let a fixed number N of circles with radii a and areas 
b = na 2 be chosen independently, with the probability density function for 
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the eenter of each circle constant and equal to l/R' in R'. The set X is the 
intersection of the set-theoretical sum of the N circles with R. The set Y con¬ 
sists of those points of R that do not belong to X . Equation (1) holds as before. 
The analogue of (4) is 

(26) B(Y) - jf’ jp p(x u x t ) dxy dx, = ftp - 1)", 

while (8) becomes 

(26) E(Y S ) = 4 jf jf p(xi , x 2 , 2/1 , j/j) dxj dx* d|/i <* 2 / 2 , 
where 


(27) p(xi ,Xt,yi,yt) = Pr((xi, x 2 )eF and ( 2 / 1 , 2 / 2 ) «F). 


Introducing the new variables (9) we obtain the analogue of (12), 

(28) ft(F*) = 4 jf jf /(A 2 - !) 2 )(Ai - t>i) dvidt> 2 , 

where, setting r = (r t + v 2 )\ 


(29) /(i>i, *) 


( i -iy i,rs2a - 

' ! 1 - 2b - 2 “’ (fe) + i V4a> - r-J j, , < 2a. 




Introducing polar coordinates r, 0 in the , t> 2 -plane and carrying out the obvious 
integrations, we obtain 

E(Y 2 ) = p - py jft 2 + y a 3 (Ax + A,) - 8a 4 - 4hftj 

(30) + 8a 2 jf (t Rt + 4a 2 t 3 - 4 a(A t + A 2 )r) 

— 26 — 2a 2 arccos t + 2a t\/ 1 — 

If now JV is a random variable with generating function (17), then (25) becomes 

(31) E(Y) - ft^p 
and hencq 

(32) E(X) = ftjl - *p - Ay, 
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E(X*) - E\X) m E(Y*) - E , (Y) 

<p(l - + j a*(Ai + A t ) -8o 4 - 46ieJ 

- R*S(l -±^ + S o 2 j[* (,Bt + 4oV - 4a(4, + A t )t) 

( . 2b — 2a* arccoe t + 2a* iy/l — t* 

•"V-F-, 


it. 



SAMPLING FROM A CHANGING POPULATION 1 * ' s 

, / ' 

By Reinhold Baer \ / 

University of Illinois 

1. Introduction. If, in sampling a certain population, it is impossible to take 
more than one sample at any given time, and if the population changes between 
any two samples, then we are confronted withAhe following mathematical situa¬ 
tion. For every 3 t, 0 < t < 1, there is given a distribution 4 * (= population) 
D{t). Let furthermore tj be, for 0 < j < n, a number between (j — l)/n and 
j/n ; and assume that x 3 - is a sample taken from the population D(t 3 ). We denote 
by T n the set of the numbers h , • • • , t n and by 0(T n ) the sample consisting of 
the Xj ; and we assume that 0(T n ) is a random sample, i.e. that x \, • • •, x n are 
independent variables. The question arises to get information concerning the 
family D(t) from the sample 0(T n ). It is clearly hopeless to try for information 
concerning an individual D(t) or even some D(t 3 ) or the statistics that may be 
derived from them. But we may hope for information in the mean, if we assume 
that the family D{t )»is in some sense continuous in t. To make this statement 
more precise we denote by a{t) the average and by Mi(t) the i -th moment of 
D(t) around its average. We assume then that a(t) and M t (0, for i < 8, exist 
and are continuous functions of t , and in section 7 we shall have to assume 
furthermore that a(t) and M 2 (t) are functions of bounded variation. These 
hypotheses assure the existence of 

the mean average a 

and the mean t-th moment ikf< 

for i < 8. Clearly we may hope for information concerning a and Mi from the 
random sample 0{T n ). It is our object to discuss certain more or less well 
known statistics of the sample 0(T n ), and to determine their stochastic limits 6 . 

1 Presented to the American Mathematical Society. September 15, 1945. 

* The author is indebted to Dr. E. L. Welker for checking the results, in particular'those 
rather obnoxious computations needed in sections 6 and 7 which the author did not incor¬ 
porate into thiB paper. 

* It constitutes a restriction of generality that we consider finite closed intervals only. 
But it is no further loss in generality to use the interval from 0 to 1, and this choice certainly 
simplifies notations. 

4 Comparatively little will be assumed of these distributions. These properties will 
be enumerated in Section 2. 

* See (2J p. 81 and the criterion 2.d. of section 2. 
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= f a {t) dt 
Jo 

- f Mi (t) dt 

Jo 
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As an illustration we mention the following results which will be obtained in the 
course of'this investigation (among others): 6 

£ = n~ l x s converges stochastically to the mean average o; 

/-1 

n mX 

8 2 = n” 1 2 — £) 2 converges stochastically to M 2 + I ( a(t ) — a) 2 eft; 

*-i Jo 

n—1 

(t 2 = (2n)^ 1 2 (xj — £,+i) 2 converges stochastically to the mean variance M %. 

It is clear that M 2 is the stochastic limit of s 2 if, and only if, a(t) is constant. 
If a(t) is not constant, then s 2 is not a consistent estimate 7 of M 2 , and will have 
to be rejected—at least for large n —in favor of d 2 which is always a consistent 
estimate of M 2 . 

It was this last point that led us into this investigation. Recently the sta¬ 
tistic d 2 has found much attention; and the question arose as to why the statistic 
s 2 should be rejected in favor of d 2 . Reading the illuminating introduction of the 
fundamental paper [1], one sees that just such a situation as we have attempted 
to describe here in somewhat abstract terms has necessitated the use of d 2 . 
Consequently our result may be considered a theoretical justification for this 
procedure. , 

Our other results will be discussed in their interrelation as they are obtained 
It should be noted that all our results concern themselves with stochastic con¬ 
vergence, and thus they justify the use of a sample function as an estimate of 
some statistical number only for sufficiently large size n of the sample. Thus 
it is quite possible that for small n other functions provide better estimates. 
The practical applicability of our results depends, therefore, on a criterion for n 
to be sufficiently large, and unfortunately such a criterion is not yet available. 

2. Notations and fundamental properties. We have not stated in the Intro¬ 
duction the hypotheses to which we subject the distributions under considera¬ 
tion. For our investigation we shall need only very few properties of distribu¬ 
tions. Thus we are going to enumerate now some properties of distributions 
which we are going to use, and we shall assume throughout that these properties 
are satisfied. As will be seen these hypotheses are rather weak and are satisfied 
by a large class of distributions. 

If x is any stochastic variable, then we denote by E(x) its mathematical ex¬ 
pectation, and the only properties of stochastic variables that concern us are 
properties of their expectations. E(x) is a linear operation satisfying E(l) = l. 

• It should be noted that the stochastic limit of the following statistics would not be 
changed, if we substituted for the denominator n of s* the denominator n — 1 which is often 
used, and if we allowed the summation in the expression for d s to range from 1 to n, defining 
%n+i as Xi . 

1 Wilks |2], p. 188. 
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If furthermore xi , • • • , x» are independent variables, and if the function / 
depends on some of these variables whereas g depends only on the others, then 
E(fg) « E(j)E(g) t and this property may serve as a definition erf independence. 

As stated in the Introduction we are going to study a family D(t) of distribu¬ 
tions, for 0 < t < 1. If a? is the stochastic variable of the distribution D(t) 
for some fixed t> then we let 

a(t) = E(x) and Mi{t) = E((x — a(t )) % ). 

We shall assume throughout that the average a(t) and the variance M 2 (t) exist 
for every t , and that a(t) and M 2 (t) are continuous functions of t. Moreover, 
when discussing ikf,(r), 1 < i < 4, we shall assume that every M /r) with j < 
2 i is a continuous function of r. Thus we are sure that the mean average a 
and the mean variance M 2 , as defined in the Introduction, always exist, and 
the mean i-th moment Mi exists, whenever Mi{t) is a continuous function of t. 

Remark: If the mean t-th moment Mi exists for every t, then one may be 
tempted to consider as the mean of the family D(t) a distribution D with average 
a and t-th moment Mi , provided such a distribution exists. But this has to be 
done with some caution. For suppose that every D(t) is normal. Then Mi(t) = 
0 for every odd i, implying Mi = 0 for odd i so that D would be symmetric. 
Bijt Mu(t) = 1*3 ••• (2 i — and hence M 2 i = 1-3 ••• (2 i — 1)* 

j£ M 2 (t) % dt, and the integral will be the f-th power of M 2 only if M 2 (t) is con¬ 
stant. Thus the mean distribution D of a continuous family of normal distribu¬ 
tions need not be normal. 

As in the Introduction we now let U be some number between (i — 1 )/n and 
i/n , and denote by Xi a sample taken from the distribution D(U). We denote 
by T n the set of the n numbers U and by 0(T n ) the sample consisting of the . 
It will be assumed throughout that 0(T n ) is a random sample, i.e. we shall 
assume that X \, • • • , x n are independent variables. 

We are not going to make any use of the customary definition of stochastic 
convergence 8 (and we shall therefore not restate it). Instead we are going to 
apply throughout the following criterion 9,10 : 

2.d. The function f(0(T n )) of the sample 0(T n ) converges stochastically to the 
number r, if 

lim E(f(p(T n ))) = rand lim E([f(0(T n )) - E{f(0(T n )))?) = 0. 

n~*«8 «—►« 

All the sample functions considered will be polynomials of the variables 

Xl y * * * f • 

•Wilks [2], p. 81. 

•Wilks [2], Theorem (A), p. 134. 

** The validity of criterion 2.d. implies stochastic convergence in the customary sense. 
Thus, all results obtained in the present paper remain valid also when the customary defini¬ 
tion of stochastic convergence is adopted. 
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8« The mean average. Though the discussion of this section is rather obvious, 
we give the details, since they may serve as a convenient introduction to the type 
of argument we have to use throughout. 

Theorem. $ converges stochastically to a . 

n * \ 

Proof: We note first that E(x) « n~ l E(xj) — riT 1 a(lj). Since l * is 

_ i-i i-i 

between (j — l)/n and j/n, and since n~ l is the length of this interval, it follows 
from the continuity of a(t) that 

i .1 ft 

/ a (t) dt « lim n ~ 1 T) a (tj) ; 

Jo ft-* so 7-1 

and thus we have shown that E (£) tends to a as n tends to infinity. 

Next we find that 

E((x - E (£))*) = tT 1 *([£ (*, - a(lj)) J) 

= n“* E E((Xj - a^)) 2 ) = n- £ If.fo), 

j-i i-i 

since E((xj — a(<i))(^ — a( 4 ))) = E(xj - a(tj))E(xh - a(fo)) = Ofor> 5 * h. 
But Mt(t) is, for 0 < / < 1, a bounded non-negative function, showing that 
E{(x — E(x)) 2 ) tends to 0 as n tends to infinity. Applying 2.d. we find that 
x converges stochastically to a, as we intended to show. 

Remark: It is clear that the speed of the stochastic convergence of x to a de¬ 
pends on two factors: 

(i) the goodness of x as an estimate of E(x ); 

n 

(ii) the speed of convergence of the sums rC 1 a{t 3 ) to the integral a = 

i ✓ imml 

J a(t) dt. 

It is this difficulty which expresses itself in (ii) and which makes the present 
type of statistical estimation less effective than thfe one concerned with sampling 
from one distribution only. As to (i), it is again, as may be seen from the proof, 
of the order of magnitude ( M 2 /n )*, (see Theorem 1, section 4). 

It is probable that x is a better estimate of E{x) than of a. But this does not 
help, since the former depends on the particular choice of T n . 

4. The variance. Theorem 1 . d 2 converges stochastically to M 2 . 

Proof: We note first that 

E((xj — x;+i) 2 = E([(xj — a(tj)) + (a(t } ) — a(tj+ 1 )) 4 - (a(£ y -u) — £>+i)] 2 ) 
= 'M*(tj) + (a(tj) — a(* y+ i )) 2 + Mzitj+i), 

since E((xj — a(tj))(xj +1 — a(tj+%))) = E(xj — a(tj))E{xj +1 — a(tj+\)) — 0, 
2£(const) = const and E((xi — a(U)) 2 ) = M 2 (U). Hence 

E{<f) - (2n)“ 1 (A + B - C), 
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where A * 2 22 Mittj), B = 2 (&(*y) a(<y+i)) a > C * Afs(4) + M 2 (t n ). ■ Since 

>~i j-i 

Jyis a value between 0 — l)/n and,//n, and since n 1 is the length of this interval, 
it follows from the continuity of the function M 2 (t) that M 2 = jf M 2 ($) d£ = 
lim (2n)” 1 i4. Since Af 2 (0 is bounded as a continuous function, it follows that 

I*-* 00 

(2n)~ 1 C tends to 0 as n tends to infinity. Finally we infer from the continuity 
of a{t) —which is used here for the first time to its full extent—that there exists 
to every given positive e an integer N = N(e) such that («(£') — a{t n )) 2 < € 
for | t' — t n | < (2A0 -1 . Thus for N(§) < n we have (a(/y) — a(t j+ 1 )) 2 < e and 

(2 n)~ l B < n €. Hence (2n)~ 1 B tends toO as n tends to infinity, and we have 
2 n 

shown that 

E(d 2 ) tends to M 2 as n tends to infinity. 

Next we note that 

- E(d 2 )) 2 ) = tf(d 4 ) - £(d 2 ) 2 

* (2n)“ 2 £ [#((*< “ *<+i)*(*/ ~ x J+l ) 2 ) - E((Xi - a; <+ i) 2 )^((xy - o: /+ i) 2 )]. 

M 

But if both i and t + 1 are different from,; and j + 1, then E((xi — x i+i ) 2 (xj — 
zy + i) 2 ) = E((xi — x i+ i) 2 )E((Xj — x;+ j) 2 ), and thus there are not more than 3n 
summands in the above summation that are not identically 0. These sum¬ 
mands, however, depend only on a(t k ), M 2 (tk), M a (4) and and they are 

therefore bounded. Thus E((d 2 — E(d 2 )) 2 ) is equal to (2 n)~ 2 times a sum of 
not more than 3 n summands which are bounded. Hence E((d 2 — E(d 2 )) 2 ) tends 
to 0, as n tends to infinity. Now our theorem is an immediate consequence of 
the criterion 2.d. 

Theorem 2. s 2 converges stochastically to M 2 + J (a(l) — a) 2 dt. 

n 

Proof: We note first that n(x, — x) = 2* (Xj — x h ) and that therefore 

h~i 

n 

s 2 = w~ 8 22 ( x i ~ Xh)(xj — Xk). Since Xi — Xj = x * — a(U) -j- a(U) — a(tj) — 

J-l A,Jfc 

(zy — a(tj ■)), we find as usual that 

«((*y ~ **) 2 ) - + (a«y) - a(t h )) 2 + 

and if h ^ k we find that 

#((xy - afc)(®y - **)) = Af 2 (*y) + (a(tj) - a(t h )){a{t } ) - a(4)). 
Consequently 

Z - ^)(«y - £*)) = n 2 ikf 2 (fy) + Z Mi(t k ) 

h,k K-l 

+ 22 («(</) - «(<*))(o(<y) ~ <*(<*)) 

Kk 


= n*M s (< y ) + t, Mt(t h ) + fs («(</) - a«*))T. 

A-i L *- 1 J 
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Consequently 

E(g l ) « n~ l £ Mi(< y ) + n -2 2 M,(h) + » _s £ fe («(</) - o(<*))l • 

j-i *-i 7-1 Lfc-i J 

As in the proof of Theorem 1 we see that the first of these sums tends to Ms as 
n tends to infinity, and the second of these sums therefore tends to 0 as n tends 
to infinity. The last sum equals 

n~* 23 [<*(<y) J — a(tj-)(a(t\) + a(i*)) + a(t k )a(t k )] 

}th*k 

= rT 1 23 a(t,)* - 2n~* 2) a(tj)a(t k ) + n~* 23 a(t k )a(t k ) 

7—1 3»h h,b 

= ri~ l 23 a(i,) s - fn -1 23 a(*,)l , 

?-i L j-i J 

and this expression tends to J a(t) 2 dt — a(t) dtfj as n tends to infinity. 
But 

Jf a(t) 2 dt - |jf a(t) dtj - (a(t) - a) 2 dt , 

since a = J a(t) dt , and thus we have shown that 2£(s 2 ) tends to 

M 2 + / (a(Z) — a) 2 dt bsu tends to infinity. 

Jo 

If y, /i, ifc, p, q , r are integers between 1 and w, we put 

( j, K k; p, r) = E((Xj - x A )(x,- - x*)(x p - x q )(x p - x r )) 

— E((Xj - X h ){Xj - X*))#((x p *— x*)(x,> — x r )). 

If neither y, h nor k is equal to any of the three integers p, q y r, it follows from the 
independence of the variables Xi that (y, h, k\ p, q* r) = 0. Thus 

^((s 2 - E(s 2 )) 2 ) - Z?(* 4 ) - £(s 2 ) 2 - rT 6 S'(y, *, p, r), 

where the summation is taken over all the values of j, h, k, p, q y r between 1 and 
n with the restriction that at least one of the three numbers j, h, k is equal to at 
least one of the three numbers p, q, r. This sum contains therefore not more than 
3 V summands, and each of the summands is bounded, since they depend only on 
a(ti) t Mi{ti)y M s (ti) and M 4 (fc). Thus E((s 2 — E(s 2 )) 2 ) is equal to rC 6 times a 
sum of not more than 3*n 6 summands which are bounded. Hence E((s 2 — 
E(s 2 )) 2 ) tends to 0 as n tends to infinity. Now our theorem is an immediate 
consequence of the criterion 2.d. 

Noting that J £ (a(t) — a) 2 dt is nothing but the variance of the function 

o(0 (around its mean a), we obtain the following obvious consequence of Theo¬ 
rems 1 and 2. 
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Corollary: s 2 — d 2 converges stochastically to the variance of a(t ). 

Remarks similar to those made in connection with the proof of the theorem of 
section 3 may be made now in regard to the theorems of this section. 

»—i 

By similar arguments it is possible to prove that the statistic rf 1 ^ x &i +1 
converges stochastically to jf a(t)* dt. 


6. The third moment. Put d(3) « n 1 23 (% — z/+i) l (xy+i — £/+s). Then 

j— i 

d(3) is a function of the random sample 0(T n ). 

Theorem 1: d(3) converges stochastically to M* . 

Proof: It is readily seen that 

E((xj — Xj+i)*(Xj+i — Xj+ 2 )) = M 9 (tj+ 1 ) + (a(^- + i) — a(ty+i))(ilff(ty) 

+ (0(<y) - («(</) - a«y +1 )) ! + M 4 (<y+i)), 

and in practically the same fashion as in the proof of Theorem 1 of section 4 one 
shows now that E(d (3)) tends to Mj as n tends to infinity. 

Furthermore we have 

Em 3) - E(d( 3)) 1 ) = B(d( 3) 2 ) - B(d( 3)) 2 - n" 1 £ (y, A), 

where 

O', &) = F((xy - Xy+i) 2 (Xy+i - Xj+ 2 )(xh - Xft+l) 2 (X/r 4 -l - X* +2 )) 

- tf((Xy - Xy + 1 ) 2 (Xy+i - X j+ 2 ))E((x h - X K +i)\x h +i - X h + 2 )). 

Clearly (J 9 h) = 0 whenever j + 2 < or /i + 2 < Consequently there 
appear actually in the sum of all the 0, h) not more than 5 n terms each of which 
is bounded by an absolute constant, since they depend only on a(U ), M 2 (ti), 
Mi(ti), M h (ti) and From this fact we infer as before that 

E{{d{ 3) — E(d( 3)) 2 ) tends to 0, as n tends to infinity, and our theorem is an 
immediate consequence of the criterion 2.d. 

Remark 1. If M 9 (t ), M 2 (t) and a(t) are constant, it follows from the proof that 

Em)) = n ~^ ; 

n 

n—2 

and thus (n — 2)"" 1 (x y — xy+i) 2 (xy+i — xy +2 ) is an unbiased estimate of Af*. 
y-i 

Remark 2. One might be tempted to use instead of d(3) the following function: 

n~' £ {Xj - x J+ 0 ‘. 

1-1 

By an argument of a nature rather similar to the one used in the preceding proof 
one may show, however, that this statistic converges stochastically to 0. 
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Put *(3) * n 1 (xj — £)*. Then «(3) is a function of the randoih^Bample 
0(T n ). Furthermore let 

Fa - 3^ a(i)Ma(<) A - aAf, - o jf o’«) + 2a* + j£* a‘(t) d4 

Theorem 2. s(3) converges stochastically to M* + Fz. 

Proof: For fixed j , let X(j) == X) (— a(*y) + a(fe) — £a) and A(;) 53 

A—1 

E (a(<,) - o(4)). Then 

A-l 

£(*(3)) = n"*E E((X(j) + A (j))') 

7-1 

-«-<£ ( £ Wi)‘) + 3AUMXU)') + A on 

7-1 

since E(X(j)) is easily seen to be 0. We find furthermore that 
F(X(i)’) - (n - l)‘Af,(4) + F«£ (o(4) - x*)]‘) 

= ((n - 1)‘ + l)M t {tj) - t, M t ( 4); 

A—1 

F(X(i) s ) - (n - 1) ! M,(4) + F(£ (o(4) - x*) 1 ) 


= ((n - 1) ! - l)Af,(4) + £ Af,(4). 

k"l 


Consequently 


E(«(3)) = n~*[((n - 1) J - n + D EA/,(f,-) + 3((n - 1)’ - l)g AU)M,(t t ) 

+ 3 E 40) E Afj(4) + E A 0)*"|. 

y-i a-i 7 -i J 

n 

Since furthermore X A(j) = 23 («(<y) — a(fc)) 38 0 , 

7-1 M 

E AO)M t (t i ) = n E o( 4 )M,( 4 ) - E «(4) £ M,(ty) 

7-1 7-1 *-l 7-1 


E 4 ( 7 )’ *= E [«o(4) - g o(4) J 

= n‘ E «(*/)* ~ 3n ! E o(t>) s E o(4) + 3n E °(4) I" £ o(4)l 
y—i y—i a—i y—i L*-i -J 


»jjgo(4)J , 
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it is easily verified that E(s( 3)) tends to Af a + F a , as n tends to infinity. 

To prove that F((s(3) — E(a( 3))) 2 ) tends to 0 as n tends to infinity, one 
proceeds as in the proofs of the preceding theorems, namely by verifying that 
this expectation is n“ 8 times a sum of not more than 4 4 n 7 summands which are 
bounded, since they depend only on a(U) and on the M m (U) for 1 < m < 7. 
The proof of the theorem may then be completed by applying the criterion 2 A 

It is readily seen that F 8 vanishes whenever a(l) is constant. But from 

-f* J (a(t) —* o) 8 dt 

we infer that F 8 vanishes too whenever M 2 (t) is constant and a(t) is at the same 
time symmetric with regard to a, and more precisely: if M 2 {t) is constant, a 
necessary and sufficient condition for the vanishing of F 8 is the vanishing of the 
third moment of the function a(t) around its mean. Thus we see that d(3) 
is always a consistent statistic for Ms , though s(3) is not. 

6. The fourth moment. The results in this section will be stated without 
proof. Their proofs can be constructed on exactly the same lines as the proofs 
in sections 4 and 5. 

n—1 n —1 

(2n) _I X ( x i ~ Z;+0 4 , n~ l X (*i-i - x j)\ x i+i ~ x i) 

7-1 7-2 


Fs * 3^ a(t)M 2 {t) dt — a3f 2 J 


and 


n— l 

n~ l X ( x i -1 - XjfiXj+i - Xjf 

7-2 

converge stochastically to M K + 3 / M 2 (t) 2 dt. 

Jo 

(4n) _1 [g (Xj ~ Xj+if J converges stochastically to Mi + M\. 

»—2 «1 

(4 n)~* 23 (^7-1 ~ Zj)*(xj+ 1 — Xj+2) 2 converges stochastically to / M 2 {t) 2 dt. 
j —2 Jo 

From these facts one easily deduces that Mi is the stochastic limit of 

» -l f| £ (Xj - Xj+ 0 * - * X (Xj-l - Xjf{x j¥ 1 - Wl 
L * 7-1 4 7-2 J 

and that J (M 2 (t) — M 2 ) 2 dt is the stochastic limit of 

(2«) -1 £X (.Xi ~ Xj+,)* - ( X J ~ ~ g (Xj-l ~ Xj)\x y+i - *,+,)*J. 


7. Efficiency. If / =* f(0(T n )) is a function of the random sample 0(T n ), 
and if / converges stochastically to a number r, then 


lim nE((f — r) a ) 
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may be considered as some sort of a measure for the efficiency 111 ** of the statistic 
/ as an estimate of r, provided, of course, the limit exists. 

Theorem 1 . If the function a(t) is of hounded variation , then 

lim nE{{2 — a) 2 ) = M 2 . 

»-* oo 

Proof: Clearly 

nE((£ - a) 1 ) = (xj - a)J ^ 

= «~ l ^2 M t (tj) + n~ l JjEj (a(tj) - a) J. 

« » » f ri/» 

Now 2J (o(<y) — a) = 2Z <*(*/) — na = 2J a(^) — n / a(f)d$ 

i-l ;-l J-1 L J (i-l)/n 


Since a(f) is a continuous function, there exists a number u, such that 

ri/n 


run 

(j — 1 )/n < Uj < j/n , and / = n 1 a(uj). 

•'O'-D/n 


Thus 


22 («(<,) - a) = 22 («(<y) - «(«,)). 

/-I 1-1 

But both tj and Uj are between (j — l)/n and j/n, and a(t) is of bounded varia¬ 
tion. Hence there exists a constant A which depends on a{t) only and not on n 
or T n such that 

~ < A for every choice of T n - 


The contention of our theorem is a fairly immediate consequence of these facts. 

This theorem and its proof may serve as an additional substantiation of the 
remarks appended to section 3. 

Remark: If we had assumed only the continuity of a(t) instead of its being 
of bounded variation, we could have tried to argue as follows: Since a(t) is con¬ 
tinuous, there exists to every positive number e an integer N{e) such that | a{t') — 
a(t") | < c for | t r — t" | < N(e)~ l . Hence we would find that for N(t) < n 
we have 

|jC (*y ~ <*)] < nt ; 

and this inequality is certainly insufficient for proving that the left side of the 
inequality tends to 0 as n tends to infinity. 

Theorem 2: If the functions a(t) and M 2 (t) are both of hounded variation , then 
lim nE((d 2 - M 2 f) = M,. 


” Wilks [2], p. 134/136. 

or a measure for the asymptotic variance of the function /. 
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Proof: In the course of the proof of Theorem 1 of section 4 we have shown 
that E(d % ) — (2 n)~\A + B — C), where 

A - 2 £ M, (*/),* = £ (add - o(t y+1 )) J ,C - M,«,) + 

/-l i-i 

Since Af 2 (J) is bounded, it is clear that n~*C tends to 0 as n tends to infinity. 
Since a(t) is of bounded variation, there exists a constant B* such that B < B* 
for every choice of T n , and hence n { B tends to 0 as n tends to infinity. 13 Fur¬ 
thermore we have 


n n r /•>/* * n 

53 Midi) - nM t 53 M^ti) - » / Mtd)dt . 

7-1 7-1 L J (/-l)/» J 


Because of the continuity of M 2 (0 there exist numbers such that 

ri/n 

(j — l)/n < Vj < j/n, and Af 2 (t>,) = n / M 2 (t)dt. 

Ju -1 

Consequently 


'<7-D/n 


£ M,« y ) - nM, 


Z lAf a (ty) - If,(By)]. 

7-1 


But Af 2 (Z) is a function of bounded variation, and thus we may infer, as in the 
proof of Theorem 1, that n 4 [(2ri) _1 A — ilf 2 ] tends to 0 as n tends to infinity. 
Combining all the facts we see that n\E(d 2 ) — M 2 ] tends to 0 as n tends to in¬ 
finity, and hence we have shown that n[E(d 2 ) — M 2 ] 2 tends to 0, as n tends to 
infinity. 

As in the proof of Theorem 1 of section 4 we note next that 
E(A - E(d*f = (2n)- 2 53 


where (z, j) = E((x t — x i+ iY(xj — Xj+if) - E((xi — x,-+i)*)E((x, — x i+ i)'), 
and that (i, j) = 0, if either t+l<jarj + l<t. Next we observe that 
(i, j) = E((Xi — add + adi+ i) — x w f{xj — ad,) + a(( i+1 ) — x j+ ,) 2 ) 

- E((xi - add + ad,+i) - x i+ i)*)E((xj - ad,) + adj+i) - *,+i) 2 ) 
+ (add ~ adi+d)(i,jY + (add ~ a(t j+1 ))(i, j)", 
where the expressions (z, j)' and (z*, j)" are bounded (by a number independent 
of i, j y n or T). 

Consequently we have 

{iy i ) = M 4 (t%) + 6M 2 (£ t )ilf 2 (f,+i) + M 4 (ti+i) ~~ (M 2 (£,) + M 2 (U+i)) 2 

+ (a(ti) - a(ti+i))(iy z)* 

= Mi(U) + Mi(ti+i) + M 2 (U) 2 + M 2 (t w ) 2 

- 2 (Midi) - Midi+i))* + (add ~ a(ti+i))(i, i)*, 
where (z, i)* — (z, z)' + (z, z)" is bounded by a bound independent of z, n, T n . 

13 A remark similar to the one made just before stating Theorem 2 may be made here and 
below about the indispensability of the hypothesis that a(t) and Af*(0 be of bounded varia¬ 
tion. 



SAMPLING 


359 


Likewise we find that 

(i, i + 1) ■» Mi(t{) Mt(t{+%) 

- (MM + MMi)) (MM 0 + MM)) 

+ (a(U) - a(*<+i)) (t, i + 1)' + (afo+O - a(«<+t)) ( i , i + 1)'' 

- MMi) - MMi) % 

+ (a(U) — <*($<+i)) (t, i + 1) + (a($i+i) — a(£ <+2 )) (t, i + 1) 

Hence 


(t, i) + 2 (i, i + 1) Mi(ti) + 3MM) + (M%(u) 


- MMi)) (3 MM) - MM) + (a(W 

- a(<<+i)) (*, i) + + (a(<m) 

- a(t i+2 )) (i, i + 1)", 


where (t, i) + 
of i , n, 7 1 . 


= (i, iy + (i, i)" + (i, t + 1)' is bounded by a bound independent 
Considering that 


«—2 


]C (*\i) - ]£ (*> i) + 2 2 (*\ *’ + !)> 


it is now deduced from the continuity of the functions a(t ), M 2 (<) and M*(t) that 
n[E(df) — l£(d 2 ) 2 ] tends to Mi , as n tends to infinity. We note finally that 
i?((d 2 - M 2 ) 2 ) = #((d 2 - (d 2 )) 2 ) + (E(d 2 ) - M 2 ) 2 , and the theorem is an im¬ 

mediate consequence of the facts we have deduced. 

Theorem 3. If the functions a(t) and M 2 (t) are both of bounded variation, then 

lim nE((s z - M 2 - f (a(t) - afdtf) 

n—*oo J() 


= M 4 - I Miltfdt + 4 f (a(t) M»(t) - aM t )dt + 4 f M,(t) (a(t) - afdi. 
J o •'O •'O 


Proof. Since a (t) and M 2 (t) are of bounded variation, we show— as in the 
proofs of the two preceding theorems—that 

n 4 (n~ l 23 a (h) — a),n*(n~ l 23 a (^if — [ a(t)*<&), and 

y—i y—i Jo 

n*(n _1 £ Mi(tj) - Mi) 
1 

all tend to 0, as n tends to infinity. In the proof of Theorem 2 of section 4 we 
computed E(s 2 ). Using this result we obtain: 

»‘(JS(«*) - Ml f (o(<) - afdt) 

Jo 

- n\n~ l £ Mi(t s ) - Ml) + n~'n~ l £ M,«,) 

y—i - ?-i 

+ n*(n _1 23 a(^) 2 — f a(tfdt) 
y«i Jo 

+ - £n -1 £ o(ty)"| ^ 
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where one should remember the identity f (a(t) — a) 2 dt = f a(t) 2 di — a 2 . 

J o Jo 


But 


n* ^a 2 — £n~ 1 <K*y)J ^ = n* — rC 1 Z) a (^))( a + n 1 2 > 

where the last factor on the right is bounded by a bound independent of n and 
T n . Hence it follows that 

n^E(s 2 ) — M 2 — J (a(t) — a) 2 dt'j tends to 0, as n tends to infinity. 

By a computation of great length and little interest one shows that 
nE((8 2 - E(s 2 )) 2 ) - rT 3 r (ti - l) 2 2 Jf 4 «y) + 4n(n - 1) £ il#i(<y)<i(«y) 

L J-l 3-1 

- 4(n - 1) f: E 0 (fc) +2rt 3/*«y)T 

3-1 A-1 Lj-1 J 

- (n 2 - 2n + 3) £ M 2 (* y ) 2 + 4n 2 £ M 2 (^)a(/,) 2 


3-1 


3-1 


— 8n Yl a(tj ) X) a{L)Mi(t h ) 


j-i 


+ 4 


E a«,)T E M 5 «») . 

_3—1 J J »—1 


It is readily seen that this expression tends to 

Mi + 4 f 1 M,(t)a(t ) dt - 4M„ a- f M,(tf dt + 4 f M 2 (t)a(t)* dt 
Jo Jo Jo 


8a 


f a(t)M 2 (t) 
Jo 


di -|“ 4a 2 M 2 j 


and now it is clear how to complete the proof of our theorem. 

Corollary 1. If a(t) is constant and M 2 (t) of bounded variation , then 

lim nE((s 2 - M 2 ) 2 ) = M A - [ M 2 (t) 2 dt. 

»-+ op Jo 

This is an almost immediate consequence of Theorem 3, since a(/) = a, if 
a(t) is constant. 

It has been shown in section 4 that d 2 is always a consistent estimate of M 2 
whereas s 2 is a consistent estimate of M 2 if, and only if, a(l) is constant. Theo¬ 
rem 1 and Corollary 1 offer a basis for comparing the efficiency of these two 
statistics. Since 


0 < M 2 (t) 2 < Mi(t) for every t 
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(apart from trivial exceptions), we infer from Theorem 1 and Corollary 1 the 
following fact. 

Corollary 2. If a(t) is constant and Mi{t) of bounded variation, then 

E((s l - Mif) _ 1 Miit) ' dt 

™ JF((d»-Jfi)*) M t ’ 

and this expression is always positive and smaller than 1. 

Thus we may say roughly that for large n the estimate s 2 of Mi is more efficient 
than the estimate d 2 , in case both may be used. 14 We do, however, not offer 
any information of the necessary size of n. Neither do we claim that for small 
n it might not happen that d 2 gives a good estimate and s 2 a poor one. 
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TESTING THE HOMOGENEITY OF POISSON FREQUENCIES 

By Paul G. Hoel 
University of California at Los Angeles 

1. Introduction. The standard procedure for testing the homogeneity of a 
set of k Poisson frequencies seems to be to apply the Poisson index of dispersion 
to those frequencies. The originators of this procedure [I] pointed out that this 
procedure may be regarded as a x test of goodness of fit in which the Poisson 
frequencies constitute observed frequencies corresponding to k cells with equal 
expected values. Somewhat later it was shown [2] that the corresponding like¬ 
lihood ratio test was approximately equivalent to the index of dispersion test. 
Then the problem was approached from the viewpoint of conditional variation 
[3], [4]. This approach permitted exact tests to be studied in some detail for 
small samples. A few years later an exact test for the special case of k — 2 
was introduced and studied [5]. In this investigation consideration was given for 
the first time to the efficiency of the proposed test. Tables of critical regions 
for the test and tables for computing the power of the test corresponding to 
certain alternatives were made available. 

In spite of the desirable features of this last test, it still possesses certain draw¬ 
backs. First, this test, as well as the others referred to, did not consider the 
problem in which the rate of occurrence of a rare event is constant but for which 
the sampling units differ in size. For example, these methods were not designed 
to enable one to test whether a factory’s accident rate had remained unchanged 
during the past month as compared with the preceding three months. Second, 
in order to use this test it is necessary to possess the special tables or charts ol 
critical regions constructed for the test. 

In this paper a method which does not require special tables is considered for 
dealing with these more general situations. In the course of the development 
it is shown that this method is, in a certain sense, the best method possible for 
testing the hypothesis of homogeneity against one sided alternatives. Since this 
paper is principally concerned with removing the undesirable features of the 
method advocated in the last mentioned paper, it is advisable to read that paper 
in conjunction with this one. The procedure to be followed here will be to derive 
a uniformly most powerful test, show that it is equivalent to a x 2 test, and then 
compare it with the previously mentioned test. 


2. Similar regions. In the following two sections a study will be made of the 
efficiency of a generalization of the critical region proposed in [5]. For this 
purpose let x and y represent sample frequencies from two independent Poisson 
distributions with means m x and my . The probability of obtaining this sample 
is given by 


y ) = 


—m x x 

e m x 


e~ mv m% 


( 1 ) 


x\ 
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Following the notation and procedure given in [5], let 

m x 


( 2 ) 


m* + m y , p 


m x + nty' 


n 


x + y. 


Then algebraic manipulation will show that P(x, y) reduces to 


(3) 


P(x, y) 


e p 


n 1 x\(n — a;)! 

The hypothesis which it is desired to test is that 


P x ( 1 - 


(4) 


TYly 

— - r, 
m s 


where r has been specified. The value of r will often be the ratio of the sizes of 
the two populations under consideration or the ratio of the time units of the two 
samples. In many situations the alternatives to (4) which are of interest will 
be one-sided. For example, after a factory has instituted a safety campaign, 
it would be of interest to see if the rate was unaffected as against the possibility 
of the rate having decreased; hence the alternatives to (4) would be 


(5) 5? < r. 

m x 

In terms of the parameters introduced in (2), the hypothesis (4) and its alterna¬ 
tives (5) become 

(6) P-r+"r 811(1 P>rTr- 

Consider the probability given by (3) in much the same manner as was done 
in [5]. This probability depends upon two parameters, p and p, only the latter 
of which is specified by the hypothesis; consequently if critical regions inde¬ 
pendent of p are desired, it will be necessary to find similar regions [6] with respect 
to ju. Since x and y are discrete variables, it is not possible to find similar re¬ 
gions of arbitrary size; consequently it will be necessary to introduce continuous 
approximating functions if such regions are desired and if best critical regions 
are to be found. Toward this end consider the expression for P(x, y) in (3). 
It states that the probability that x and y will take on specified values is the 
Poisson probability that the sample point will fall on the line x + y = n, multi¬ 
plied by the binomial conditional probability that the point will have the specified 
x coordinate when the point is known to lie on this line. If p and n are not small, 
this binomial function could be approximated well by means of a normal function. 
Or, if desired, factorials could be replaced by corresponding gamma functions 
and the necessary normalizing factor introduced. Regardless of what con¬ 
tinuous function is chosen, a region on each line x + y = n (n = 0, 1, 2, • • *) 
can be selected such that the conditional probability for this approximating 
function is a that a point on that line will lie in that region. Most natural 
approximating functions would become trivial for n = 0; therefore it may be 
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necessary to choose an artificial function for thi^ case or to adopt a convention 
of letting the origin be the critical region for this case but accepting only 100a 
percent of samples for which n = 0 as belonging to this critical region. The 
totality of such a regions will constitute a critical region of size a which is inde¬ 
pendent of ju because from (3) the probability of a point lying in this critical region 
would now be given by 

e~v 

^ . CL = * Is a. 

n-0 n ! n-0 n I 


Thus, similar regions with respect to m of size a can be obtained by selecting 
regions of size a on each line x + y = n. 

The preceding method for obtaining similar regions is the only method for 
doing so if such regions are restricted to be found on the lines x + y = n, because 
if a region of size a n were selected on each line x + y = n, it would be necessary that 


n 

e n 
i -ft! 


OLn 


a 


independent of jjl. This is equivalent to requiring that 


(f m 


V an 
San!’ 


but since the power series for e M is unique, it follows that a n = a. 


3. Common best critical region. Among these similar regions there will exist 
a best critical region for testing the hypothesis p = p 0 against the single alterna¬ 
tive p = pi if there exist best critical regions on each line x + y = n. From (6) 
it will be observed that this formulation is equivalent to testing the hypothesis 
r = r 0 against the single alternative r = n . The best critical region [6] on such 
a line, if it exists, will be that region which satisfies the inequality 


/(*; go) < k 

fix; jh) ~ ' 


where / denotes the continuous function selected to approximate the binomial 
distribution on this line and & is a constant determined so that the probability, 
under the hypothesis p = po, will be a that a point on this line will lie in this 
region. If the normal approximating function with m = np and a 2 = npq is 
used, (7) becomes 


( 8 ) 



(g— npQ* 
npiQi 


(z—npo)* "] 

wpoflO J Jc, 


After completing the square in x, it will be found that this inequality reduces to 


(9) 




n(l/qi-l/go> "[2 


c > 


where c is independent of x. 
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If £o is & value of x such that . 

(10) P[x > x 0 1 p - pol = a, 

then (9) will hold forx>xo provided that pi > po ■ To demonstrate this fact, 
it is convenient to consider the three cases Po + Pi 1 separately. If po -f- 
Pi > 1, 

1 1 „ 11 1 1 
> 0, 


— — — > 0, 
9i 9o 


1 1 

Qi go pitfi 


Pi Qx Vo go ?i ?o ' pitfi VO go' 

and therefore x<n<n(-~~ ~ ) / ( —- — —V Since the coefficient 

\?i go// \vigi vogo/ 

of the brackets in (9) which involves x is positive, increasing x will reduce the 
left side of (9). If Po + Pi < 1, 

1 1 


and 


Pi ffi Po go 


n(l/gi - i/qo) 


< 0 


< 0. 


l/pi9i ~ l/Po2o 

Since the coefficient is now negative, increasing x will reduce the left side of (9)- 
Finally, if po + Pi = 1, (9) will reduce to 

jLTT'hl [*~s] < k. 


Since 1 /pi — 1/po < 0 , increasing x will decrease the left side of this inequality. 
It therefore follows that the region defined by (10) is a best critical region for 
every alternative of the form pi > p 0 on the line x + y = n. The totality of 
such regions for n > 0 , together with the previously mentioned convention for 
n = 0 , then constitutes a common best critical region among all possible similar 
regions for testing the hypothesis (4) against the set of alternatives (5). 

In a similar manner it will be found that if the inequality in (10) is reversed, 
the critical region so defined, together with the * convention, will constitute a 
common best critical region for every alternative of the form pi < p 0 . If the 
alternative hypotheses consist of p 9 * p 0 , there will not exist a common best 
critical region using these approximating functions. 

The critical region proposed in [5] is that for the special hypothesis po = £ and 
the set of alternatives p 5 * p 0 . It will be found that the lower half of this critical 
region for P = 2 a will differ little, except for very small samples, from that given 
by ( 10 ) for this special case; however, it possesses the disadvantage of being 
numerical and therefore of requiring a special table. The critical region given 
by (10) does not possess this disadvantage. This fact will be demonstrated in 
the next section. 


4. Chi-square test* Consider the problem of testing compatibility between 
observed and expected frequencies in two cells. Let x and y represent the ob- 
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served frequencies and e x and e* the expected frequencies in a sample of size n. 
If the probability that an observation will fall in the first cell is, as in (6), p ** 
1 

T~+~~ r > then 


and 


€\ * Up 


x_+_y 
1 + r 


„<!-,»-'<£+X>. 


The chi-square function for testing compatibility then reduces to 


( 11 ) 


= V (o< - e<) 2 

iA ei 


(y — rxf 
r(y + x) ' 



Let xo be the value of x such that P\x > xol = 2a for one degree of freedom. 
With x replaced by xo in (11), this equation determines a parabola in the x, y 
plane. If x + y = n is not small, the probability of a point on the line x + y == n 
lying outside of this parabola will be approximately 2a, the accuracy depending 
on the accuracy of the x approximation, and hence the probability of a point 
lying outside of and below this parabola will be approximately a. Thus, a critical 
region for testing p = p 0 against p > po will be given by that part of the positive 
x , y plane which lies below this parabola. In Figure 1 the lower half of this 
parabola for the special case of po = i is indicated by the symbol %. The critical 
region for the alternatives p < Po would be the region lying above the upper half 
of this same parabola, while the critical region for the alternatives p 9 * po would 
consist of both of these regions at the 2a level. For one degree of freedom, x 
has a standard normal distribution; consequently the critical region given by 
(II) is the same as that given by (10) in which a normal approximation is used 
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on each line x + y — n. This equivalence is easily verified by replacing y by 
n — x and r by q/p in (ll). 


6. Likelihood ratio test. The chi-square test of the preceding section yields 
a common best critical region for testing (4) against (5) for the normal approxi¬ 
mation. It is interesting to compare this critical region with that obtained by 
the maximum likelihood principle, which requires no such approximations. 
Consider, therefore, the two dimensional parameter space 

fit Iflx ^ 0, 7fty ^ 0, 

and the subspace 


u: 


Ttly 

m x 


r. 


Maximizing P in (1) over ft yields m x — x and rhy = y. Maximizing P over 
treating P as a function of m x , yields m x — x + y/1 + r. Then the maximum 
likelihood ratio becomes 


max Pw 
max Pq 



e- ( * +v) x*y v 


xlyl 


This reduces to 


( 12 ) 



X*yv' 


For a fixed value of X, this equation determines a curve in the x , y plane which 
may be used to determine a critical region. Since —2 log X is known to possess 
an asymptotic chi-square distribution under certain conditions [7], choose as 
critical region that part of the positive x, y plane lying below the curve determined 
by (12) when X has been replaced by Xo , where Xo is determined from —2 log Xo = 
Xo • This curve may be plotted by reducing it to the parametric form 

log Xo 

x = - r—r - y = vx. 

(1 + v ) log —— } + v log- 
1 + r v 


A comparison of the critical regions corresponding to (11), (12), and a slight 
modification of [5] for the special case of p 0 = J and a = .05 is given in the accom¬ 
panying sketch. The modification of [5] consists in choosing x 0 to be that integer 
which most nearly satisfies (10), rather than to be the smallest integer for which 
the left side of (10) does not exceed a. The latter method of choosing Xo has a 
tendency to make the first type of error considerably smaller than a for small 
values of n. It will be observed that there are no appreciable differences between 
the maximum likelihood and chi-square critical regions. Furthermore, it will 
be found that there are only two values of n, namely n = 3 and n =* 9, for n < 30 
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for whieh the chi-square test and the modification of [5] might yield different 
decisions at this significance level. 

The preceding sections show that the chi-square test is highly satisfactory for 
testing the homogeneity of two Poisson frequencies, except possibly for very 
small frequencies, and that therefore special numerical tables are not necessary. 


6. Several Poisson frequencies. The generalization of (11) for a set of k 
frequencies is, of course, the ordinary chi-square function 


(13) 


X 


v (*< - npif 

Imd ~ ' t 

»- 1 npi 


b 

where n — », Pi is proportional to the sampling unit from which Xi was 

k 

obtained, and Pi = 1. The Poisson index of dispersion is merely a special 

»-i 

case of (13) when p< = 1/k. The adequacy of (13) for this special case has been 
studied elsewhere [3], [8], while studies of (13) in general are nufnerous and well 
known. 
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SOME COMBINATORIAL FORMULAS ON MATHEMATICAL 

EXPECTATION 

By L. C. Hsu 

National Southwest Associated University , Kunming t China 

The main problem considered here may be stated as follows: 

Let /i(x), • • • , /«(x) be n polynomials. It is the purpose of this paper to 
establish formulas concerning the mathematical expectation (probable value) 
of the product 

• • ■ fn(x„), 

where xi, ■ ■ , x n are positive random variables and the sum of these is supposed 
known. 

Before establishing the formulas let us introduce some notations for con¬ 
venience. 

1. Notation. (A) In this paper the notation (m; fc; X \, • • * , x „) or (m; ft; x) 
is used to denote that a set of numbers (xi , • • • , x n ) is over all different composi¬ 
tions of m into n parts with each x ^ k, i.e. over all different integer solutions of 
the equation Xi + * * • + x n — rn with each x ^ k, 

(B) Let m, 5 be two positive real numbers. The notation 2£(m, 6, L/i] • * • [/»]) 
denotes the mathematical expectation of the product fi(xi) • • • / n (x„) in which 
the sum m = x\ + • • • + x n is known and for every x v (y = 1 ,•••,») the value 
of x P /8 is a positive integer. The notation E(m, d, [/J • • ■ [/ n ]) thus implies that 
the value of m is a multiple of 6. We call the 5 a “varying unit”, i.e. the least 
possible difference between two different quantities x* and Xji j. The nota¬ 
tion E(m6 , [/]") is merely a special case that denotes the mathematical expecta¬ 
tion of the product fi(x\) • • • / n (x n ) under the known conditions 

a**+••■ + *.j ='[f]^ i, 

(r = 1, • • • , »), 

where [ ] represents “integral part of”. 

(C) In order to simplify our formulas we always denote/(x) by f x) ,f Pl + • • • 

+ fu Ly fn • • • a^d l.pi + • + k.pk by <r(p) or <r. It is a convention that 

~ 0 f° r m < n * 


2. Lemmas. Lkmma 1. Let m, n , • • • , r n be non-negative integers . Then 

0 £ TT (x\ / m + n- 1 \ 

} («,) \rj \n + . - • + r» + n - l/‘ 
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Proof: The lemma follows immediately by considering the coefficient of the 
term on both sides of 


+r*+» 


(r^r-(r^r‘ = (r^r 

Lemma 2. Let a, b, c, • • • be any constants , and k\ , A* , h , • • • any 
integers . Then 


( 2 ) 


,£,aKt) +6 fe) + '6;) + ••■] 


n! 


E ( t 


- 1 W 

• • + n — 1/ cr! 


a° 6* c Y 


m + n 

+ + 7^3 4 " * • • + W— 1/ a! 0! 7! 

Proof: Expanding the left-hand side of (2) we see that the coefftcient of the 
term a a b p c y • • • is equal to 


n! 




alfily! 

By Lemma 1 it becomes 


n! / 
a!pi 7 ! \aki 


m + 

+ P k 2 + yk 3 


h ) \ h ) 

n — 1 \ 

\ + • * • + n — 1/ 


Hence the lemma. 

Lemma 3 . Le/ m, n(^ m) be two positive integers. Then, for any given poly¬ 
nomial f(x) of the kth degree , we have 


( 3 ) 


E /(*i) • • • /(*») = «! 


y /m + n — l\ pr [(/ — l) 00 ]*’ 
(nlkp) \<r + n— 1/i-o p„! 


where f^ — f(x)> a ~ <r(p) = l.pi + * * * + kp k . 

Proof: Since /(x) is a polynomial of the fcth degree, there exist (k + 1) values 
pk , * * * , Po such that 



= /(*)■ 


By putting x = 0,1, • • • , k, it is orderly determined that 

- (;)/ ( '~ l> + ••• + (-i)’(')/* = (/-i) w , 0-0,1, ••• ,k). 


'The lemma is thus obtained by (2). 

For convenience we denote the summation X) ( m > 1; x) fi(xi) • • • /„( x n ) by 
S(m> [/x] - • • [/«]). Thus the formula (3) can be written as 


S(m f t/] n ) - n! 


y /m + n —1\A [(/—l) w r 

(»;o;p) \ or 4 " ft 1/ p p | 
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Lemma 4. Let f x(x), ••• ,f n (x) be n given polynomial*. Then. 

(4) S(m, l/J • • • [/„]) - I £ (-1 r- k S(m, [/.,+ ••• + /,»]"), 

n\ 

lifts n 

where (v\ • • • v&) rims over oW different combinations out of (1 • • • n), k ~ 1, 

Proof: The proof depends essentially on the formal logic theorem. Con¬ 
sidering a typical term 

n\ 

. . S(m, [/,,]*' • • • [/„]"), 1 < t < n, q x +•••+«.=«, 

we see that it is contained in the last (n — t + 1) summations of the righthand 
side of (4), i.e. in the summations (n • • • ?*) as k = t, t + 1, • • • , n. The num¬ 
ber of occurrences of the term in the right-hand side of (4) is therefore 

— t\ _ 0 if t > n 
v ) ~~ 1 if t — n. 

The term vanishes generally except when qi = • • • = q t = 1. Hence the right- 
hand side gives 

S(m , [/,] • * ■ [/«]). 



3, Theorems with formulas. In the following statements of theorems and 
corollaries, the notation (xi • • • x n ) is always to denote a set of undetermined 
quantities, though the kind of the quantities of the set is stated. 

Theorem 1 . Let (xi * • • x n ) he a set of natural numbers under a known condition 
Xi + • * • + x n — m. Then, for any given polynomial f(x) of the kth degree , we have 


(5) 


E(m, 1, [/]") 


y' /m + n — l\yj [(/ — 1)T* 
(m - l\ („Xp) \* + » - 1/ M Pr! 
\n-l/ 


Proof: l^et m' = w + wr. By lemma 1 we then have 


/ xi\ /x n \ _ _/ra' — nr + n 

(ioTx) Vo/ ’ AO/ ™ V n - 1 



This is the number of compositions of m' into n parts with each part ^ r. In 
particular, for r = 1 we see that the number of compositions of m into n parts is 

Thus by the definition of mathematical expectation, the required 

value is equal to 



S(m, l/D (m - lV 1 

$(w, [l] n ) ’ I e * \n — 1/ 

The theorem is therefore proved by Lemma 3. 


S(m, (/] n ). 
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Corollary 1. Let (xi • • • x n ) be a set of positive quantities of which the vary¬ 
ing unit is 5, and the sum is m. Then , for any given polynomial f(x) of the kth 
degree , we have 



where 


g(x) = f(8x), a = lpi + • • • + kp k . 


Proof: It is deduced by the relation E(m , 5, [/(x)] n ) = E(m/8, 1, t/($x) n ). 
Corollary 2. Le£ (#1 • • • x n ) be a set of non-negative real numbers under a 
known condition xi + ••• + £» = m. Then, for any given polynomial f{x) = 
^ + ... a*#*, we have 


(7) 

where 


«(w, 0, [/D 


_ (n!) 2 


(nio-i) (o’ + n — 1)1 


(Oiflo) g ° 

go! 


(fclq*) g * 
g*! 5 


an 9* 0, o- = o-(g) = qi + * * * + kq k . 


Proof: The proof of the corollary depends essentially on the concept that two 
different real numbers may differ by an arbitrarily small number h. 

Let h be an arbitrary positive number and let f(xh) = h k g(x, h), where the 
number k is the degree of fix). Then, since 


n 


r-0 



(n - v) f 


we may write 



if p > n 

if p = n 

if p = n + 1, 


§ (-1) * 0 9{v ~ S ’V= + h-R,{h)], 

where lim R v (h) 

h-+Q 



v\a ,+\. 


Now we pass to the limit h —» 0, in which it is assumed that h runs through a se¬ 
quence of rational numbers of the form l/N. Thus by Corollary 2 we have 


lim B(m, h, [/] n ) 

h -*0 


»>(»-«, E r-J—pnll 

(n;0;p) ((TtW“ 1;! y-0 




Hence the corollary. 

It may be noted that this corollary can also be independently deduced by the 
proportion of the two integrals: 


J m ' * J / * * * f (•£») d/X\ • • • dx n — i: f * * f 
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where the integrals are all taken over the region R: x\ + • • • + x n =* m, x t > 
0, • • • x n > 0. 

Corollary 3 . Let (x\ • * • x n ) be a set of positive real numbers under a known 
condition a < x\ + • • * + x n < b, where a, b are non-negative numbers . Then, 
for any given polynomial f(x) = a, + • • • + akX k (a* y* 0), the mathematical ex¬ 
pectation of the product f(x i) • • • f(x n ), which we denote by E((ab), 0, [/] n ), is given 
by the formula 


E(a, 6), 0, [/]") 


n!(n — 1)! 


v V bl ** W ~ al+ ' (Q) ^ 

(«;o; c ) (1 + <r(q))-(n — 1 + <*(#)) !(Zo! 
Proof: Since the required mathematical expectation is the mean 


jf E(u, 0, [/]") du, 


(fclotT 

3*1 


Corollary 3 follows from Corollary 2. 

On the other hand we see that 

lim E(a, a + h), 0, [/] n ) - E(a , 0, [/]"). 

/»—0 

Hence Corollary 2 can also be deduced from Corollary 3. 

Theorem 2. (First generalization of Theorem 1 ). Let fi(x) } • • • f n (x) be n 
given polynomials , of which the highest degree is k . Then we have 


E(m, !,[/,]•••[/„]) = E E (-1)""' 

( n ;o;p) 

liiin 


( m + n — l\ 
tr + n - 1/ 

(»-'0 


[(/.,.- i) w r- 


where 

Proof: In the proof of theorem 1 we have seen that 


E(m, 1, [/)") - (” _ J) * S(m, \J) n ). 


Thus, by similar reasoning and lemma 4, we have 

E(m, 1,1/J •••[/»]) - E ^ .]"). 


<*1* ■•*'•) n \ 

n yn - l) 
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The theorem is proved by lemma 3. 

Corollary 1. Let 5 be a varying unit. Then 

E(m, 8, [/,] •••[/.])=» E E (-l) n ~* 

1 



where 


g>(x) = /,(«*), ffr, -r. = g,, + • • • + g ,,. 

Proof: By the relation E(m, 8, f/i (x)] • • • [/»(*)]) = E ( m/8, 1, [/i(5x)] • • • 
[/„(6x)]) we obtain the corollary. 

Corollary 2. For any positive real number m, we have 


(11) E{m, 0, [s*] • • • [x 1 ”*]) = - n 1)! n . ™ 

KVi + • • • + p n + n — 1 )! 

PROOPiSince E(m ,5, [/i] • • -f/ n ]) = Z ( — l) n “*/n! E(m, d[f n • •« we have? 
by letting 6 —► 0, 


Pl+--+P n 


E(m, 0, [/i] • • • [/„]) « Z 


(~1)" 


n i 


E {m, 0, ,J n ). 


The corollary is therefore deduced by (7). 

Theorem 3. (Second generalization of Theorem 1). Let (xi • - - x n ) be a set 
of integers under known conditions Xi + • • • + x n — m, a < Xi < b, where m, a , b 
are given integers. Then , /or any ywen polynomial f(x), the mathematical expecta¬ 
tion of the product f(x{) • • • /(#»), denoted by E (m, 1, f/] n , is given by the formula 

( ab) 

(12) 2? (m, 1, [/]") = ^ 

< ' W E ( _ ir 

where 

g(x) = /(& + a:), /i(x) = /(a + x — 1) and m' — m — (a — l)n + (a — 6 — l)p. 

Proof: Define £(m, [fO = 0 for m < n, and S(m, [ff) = i f or rn — 0 ’ 
shall now prove that 

E (-ir (”) [ ff rth]"-') = E /(so • • • /(*»), 

*--0 V/ (*!-.. x n ) 

a b 


s(m r , mr') 
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where on the right-hand side of the expression the set (a?i, ■ • • £*) under the 
summation runs over all different compositions of m into n parts and 


a < x, < 6, y * 1, • • • , n. 

For convenience we denote the left-hand side of the expression by 0, that is, 

= Z (- 1 )' Z [/(x + b)D S(m' - m, [f(x + a - l)]" - ')- 

,_0 V/ 

Let /(^i) • * • f(x» ) be a product term contained in 0, i.e., £x + • • • + £ n * w; 
ft S o, fl. We assume that > 6 + 1, • • • , x, t >6+1, where 

v\ t* Vj if i 5^ j. Then it is seen that the number of occurrences of the product 
term in 0 is given by 


Z(-D* 


4-0 



if t> 1 

J = 0. 


Thus the product term f(x i) • • • f(x n ) of 0 vanishes except when 
a < x v < b, v = 1, • • • , n. 


Hence we have 

©= Z /(«>) • ■ • /(*»)• 

a£x£b 


Next, we shall find the number of different compositions of m into n parts with 
each a < x v < b y i.e., the number of product terms of 0. By the above result 
we see that the number is given by 


IIH y 


r—0 m—r 



Z 1 £ 1 

(«;!;*) (m'—m; l;x) 



Hence the theorem. t 

This theorem shows that the mathematical expectation E (m, 1, [/]*) can be 

(« 6 ) 

expressed by S(m[g] p ) and is therefore expressible in terms of linear combinations 
of the coefficients of the polynomial /Or). 

Corollary 1 . Let 8 be a varying unit for which ^ - are all integers. Then 

0 0 0 


E (m, «, [/(*)]") = E h i, [f(Sx)A . 

(«b) ((a/$).(&/4)) \0 / 

Corollary 2. /iOr), • • • f n (x) be n given polynomials . Then 


E (m, 1, [/i] • • • [/„]) 
(«&) 




1SIS» 


* (m f 1, [/, r ..,J n ). 

(a,ft) 



376 


L. C. HSU 


Corollary 3. The number of integral solutions of the equation xi + • • • + x n = 
m with < a* < 61, • • • , a» < $» < 6» is equal to 

'e* (-d >,+ - + '* 

. f +n- (ai + • • • + a n ) + (ai — 61 — IK + • • • + (a„ — b n — IK — 1^ 


Proof: We have shown that the number of integral solutions of the equation 
Xi + • • • + x n = m with a < x, < b is given by 

y — (a — l)n + (a — 6 — l)v — 1 



Hence the number of integral solutions of the equation Xn + • • • + #i ni + 
• • • + x t i + • • • + = m with a v < x Vfl < b ,, (v = 1 • • • s, m = 1, • • • n„), 

is given by 


«i 

s 






to, — (a, — 1 )n, + (a, 
n, — 1 


— b, — IK — 



ni xT ( -ir + 

\Vl/ \VJ 


( m — (ai — l)ni — • • ■ — (a, — l)n, 

+ (ai — 6i — 1K + • • * + (a, — 6, — IK 1 

Wi + • • • + n. — 1 

The corollary follows at once by putting n x = • • = n, = 1, s = n 
This corollary can be restated in a more interesting manner as follows: 

Let there be n store rooms, and let b x , • • • , b n be the numbers of stocks con¬ 
tained in 1st, 2nd, • • •, n-th storerooms respectively. Then m stocks contain¬ 
ing at least a, stocks of the i -th storeroom (i = 1, • • • , n) can be chosen from 
these n storerooms in 


/‘ 


m + n + (a x — bi — IK +•• 
+ (a n — b n — \)v n a x 


a n — 1 


n — 1 


different ways. 

So far we have established several combinatorial formulas concerning the 
mathematical expectation of the product fi(x x ) • • • f n (x n ) under certain con¬ 
ditions. In the next section, we shall explain how to apply these formulas. 


4. Applications, (a) A criterion. In order to make the above formulas 
applicable to practical problems we state a criterion as follows: The mathemati- 
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cal expectation of a function F(x i, • • • , z») can be estimated by the above 
combinatorial formulas if and only if the sum of these undetermined quantities 
xi , * • • , x n is known and there exist n polynomials fi(x), * • • , such that 
F*fi , • • • , F «/„, where the quantities xi , • • • , x n may or may not be conti¬ 
nuous. When the quantities are discontinuous, the varying unit is certainly 
given. 

(b) Some approximations. For f(x) = /So + • • • + P& k (fik ^ 0) we may write 
(/- 1) W 

• ■-0 


where S,, t is a Stirling number of the second kind, as used by Jordan, and de¬ 
fined by 

-is,. = 

x—0 V®/ 

Thus, the formulas (5) and (9) can be written as follows: 


(50 


(90 


E(m, 1, [/)") 


V (*» + » — 1)1 (w — »)1 wl (» — 1) 1 

(»l5fp) (m — <r)!(<r + n — l)!(m — 1)! 

. ][[ (ft &■» 4~ • • • + ft Sw,k) P r 

p,! 


E(m, 1, [/,] • • • [/„]) = £ £ (-1)"- 

(n;0;p) 

1 

(m + n - 1)1 (m - n)ln\(n - 1)! jt (B, 8,,, + •• • + B h B , h ) p * 
(m — <r)!(<r + n — l)!(w — 1)! ,Jb p, 1 , 


where 


S„ = , /< = &0 + * * ’ + +- + Pni • 

Now we state some convenient formulas concerning the number S„ . 

If m is sufficiently large and t is smaller than n% the following recurrence rela¬ 
tion is useful: 


w l =x.(- + r l ) +x ‘( m t + +r 1 ) 

(13) + • + x <-«( W 2 t-2 0 

S mim+t = (7+i‘) + l(f + l)Xo + 2Xj 

+ ••• + [(2* — l)X<-2 + fX^d( m + 
where X, m 1, X w ss 0 and Xi, • • • , X t _ a are all independent of m. 
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Starting from the first equality and using the recurrence relation 5*,,n+j «■ 
mSm,n + Sm-i,n successively we have 

m 

"S x {s(T+/+r) (< +'+»+£(“ *+7 > +!) J 

= § x {(<+ 7 + 2) (<+i+i)+ C+/+1) +1} ] 

= g [( < + + (1 + jw( t *7+ j, 

where X_i = X*_i = 0. The recurrence relation is thus deduced. 

Writing 

£».-+* = (7+ l) + Xl (? + 2 ) + * " + X, - 1 ( m 2 t 0 ’ 

and using the recurrence relation as obtained above, the coefficients Xi, • • • , \*_i 
may be exhibited as follows: 


t 

Xi 

x 2 

X* 

x 4 

X 5 

Xe 

Xr 

1 

2 

3 







3 

10 

5 






4 

25 

105 

105 





5 

56 

490 

1260 

945 




6 

119 

1918 

9450 

17325 

10395 



7 

246 

6825 

56980 

190575 

270270 

135135 


8 

501 

22935 

302995 

1636635 

4099095 

4729725 

2027025 

9 

1012 

74316 

1487200 

12122110 

47507460 

94594500 

91891800 


134459425 


Now let 

s„. n+ ,=[(” 71)+ Xi (o (r+ 2) + *" + x< - 1 (t) ( n 2t *)] n[ • 

The recurrence relation obtained above gives 
X,_,(«) = (2 1 - 1 )X w (i - 1) 

X,_,(<) = 2(1 - 1)X,^(< -!) + (<- 1)X ( _,« - 1). 


Xi-i (() 


(201 

<!2‘ ‘ 


X«_i (0 = (< - i)!£2'-*- l v 



Thus we obtain 
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—gift- 

Since the orders • * • ,^ ^ Q are all less than 2t as n 


and since 


(•+0‘-»-a(i) , S( ,+ 4 J ) 

-i©'0+3S(*-5) 

-ft(l7( 1+ n) (1 - 0( ”" )) 

_( 1+ ^)(0I, 

fe-i) >"-<') - r^+i (*i ‘)«- w »“*<» 

-^fiCr( i+t ^X0ii 

4'<o(y+iK.-') W .i 


We may write (by Stirling’s formula) 




< + 4< ( 2 /) 9 ^ + e » ) 


where c n —> 0 as n —» ». 

Now it is easily proved that the inequality 

holds fo i every positive integer x. We have, therefore, 

® (t)> Si/1 > I 
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^< 4 'C 

Using these inequalities we have 

i, = l VHt + 2)* < 4‘ Vo < \\/izr x («* - D = «<. 

where it may be noted that 


lim ~ = 1. 


Hence we have in conclusion 


<i4) § (-ir- (;) »■« - (?)‘ (i) ^ (i+ m±i±±) , 


where 


3 K ’ Vt 3 


VrS«' - *>• 

Evidently the formula (14) implies (15) and (16): 

±(-ir'Qx' 


„»+< 


(15) 


(16) 


©■(0i^(‘ + i)' 


n! ~ \Z2rn. 


t - 0(n 1 -), 


> 0. 


(Stirling’s formula). 
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ON THE CONSTITUENT ITEMS OF THE REDUCTION AND THE 
REMAINDER IN THE METHOD OF LEAST SQUARES 

By S. Vajda 

London 

0 

1* Consider a set of variates yt, (i = 1, 2, • • • , n), which are normally and 
independently distributed with variance 1. Let also a matrix (#<*) with i = 
1,2, • • • , n; k *= 1,2, • • • , * and rank 8 be given. Find bi , • • • , 5, in terms of 
so that 

= I (Vi — £*<*&*)* 

* * 

is a minimum. This minimum value shall be denoted by tfw . 

It is known (see e.g. R. A. Fisher, 4 ‘Applications of Student's distribution”, 
Metron Vol. 5, Part 3 (1925)) that #Lin varies as does x 2 with n — 8 degrees of 
freedom and that it is possible to express as the sum of n — 8 squares of 

linear functions of the y> . In the following lines y\ will be expressed as the 

• 

sum of n squares of such functions which are independent and of variance 1. 
The sum of the first 8 squares will equal V] v\ — \pmin and therefore the remaining 

i 

n — s squares equal ^4in . 

Thus a simple way will be found of writing down explicitly the linear functions, 
whose existence only was proved by Professor Fisher in Metron . 

2. We first calculate #Lin . 

a i2 

— = 0, for l = 1,2, • • • ,s, gives the normal equations 

obi 

n n • ^ 

(1) H *<iVi “ 2Z 2 XuXikbk , 

4-1 4-1 Jfc-1 

which can be written 

(2) XilVi 5=1 23 A’lJbfejfc 

i«l Jb-1 

with 


Xu. = £ 


3.z£<Jb . 


It follows from (1) that 

(A) ^min *= 23 ^23 23 23 XiiXikbtbk 23 V* — 23 £ Xikhbk , 

4-1 4-1 1-1 fcTi 4-1 2-1 Jb-1 

where the b are solutions of (1). 
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3. A second expression for ^mia can be found as follows: 

Introducing 

* 

Ci - E 

fc-1 

we obtain from (1) 

* * » 

(3) 23 *<ic< = 23 *HV* I (Z « 1, 2, • • • •). 

1-1 1-1 

Now if Zi U , (« = s + 1, • • • n), are any n — s independent solutions of 

n 

23 = 0| (Z *■ 1, 2, • • • , a), 

»*-i 


then the c»* satisfy also 

n 

(4) 23 =0, (u « a + 1, • * • n). 

t-i 

Let Such a set of z iu be chosen. Then (3) will be solved by 

(5) 

with A„ as indefinite factors and these c, satisfy (4), if 


Ci = yi - 23 

r—•+! 


( 6 ) 


23*<»v< = 23 E^*^*. («-=«’+1, •••»), or J^ziuVi 

1—1 1>—8+1 1—1 1—1 


23 

•—*+1 


with 


= E 


- 


Because of (2) tht/ equation (A) can -be transformed into 

= E v* — E 23 = E - 23 Vi* = 23 E i/* 

2-1 2-1 1-1 i-l i-l 1-1 v-»+l 

which is, because of (6) 


(B) V'min = 23 23 

ti— «-fl r—«+l 

where the A are solutions of (6). 

The comparison of (A) and (B) gives 


E v? - E E &* + 23 E 2„x u x„ 

•-1 2-1 A—1 u-t+l v-4+1 
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where the first form on the r.h.s. shows the reduction of ]jT) y\ by the method of 
least squares and the second form constitutes the remainder. 

4. These two forms must now be expressed in terms of the y %. 

We introduce the notations 

XixXn Xu - Xu 

X (1) «Xu, X (2) - , ..■X w - 

XnXn X'i ••• X„ 


7 y{$+ 2) __ 

£*+ 1,«+1 > " — 


Z»+ l«+2 
Zm+ 2 i+l Z*+l #4-2 


It is well known (and can easily be verified) that 

E E X«&,6* = (X„b, + • • • + Xub.f 

1-1 fc—l ^ 


+ X H> 


’x <2) (lx*; 


&2 + * • • + 


r u x'i.L Y 
r«x,rv 


l i _ Y'(t)* i2 

i***i j^(»—1) j£<«) A 


which may be written 


£<n ^E Xubk^ 


1 

x a) x (! 


X u EXub* 

ib—1 

x« Exub, 


Xu Xu • • • E Xu6 t 


X (,Tl) x (l 


x.,x. 2 • • • E X.*b* 

fc-1 


Using (2), this can be expressed in terms of the instead of 6* as follows: 


(§*“*) + x4 


n |2 

X u J^x a y< 

i-1 

n 

x« E ^*2 2 /< 


I v' 12 

■^"ll X 12 * • * 53 £*lV< 

t-1 


+ * * * + <*+!> j£<«> 


-X"#: -3r«2 * * 53 2 
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Similarly by (6) the second form can be transformed into 


( 8 ) - 


1 

z (.+v 


(g 


+ ' 


r(n—1) rf(n) 


n |2 





Z *<n 2 /i 


The rank of (x ik ) is s, so that the order of the suffices can always be chosen 
so as to make the above denominators different from zero. 

Thus both the reduction and the remainder have been expressed by sums of 
squares, whose numbers correspond to the “degrees of freedom” 8 and n — 8 
respectively. 


6 . It remains to be shown that the linear functions of the y, appearing in each 
forrp are mutually orthogonal and that in every one of them the sums of the 
squares of the coefficients are unity. 

n 

Now if we call the n linear forms which occur above dijUj , (i = 1, 2, * • • , n), 

7-1 


then our proof implies that. 



» r n ~|2 

Z Z a-nVi = 

•-i L.j-1 J 


n n n 


ZEE aija^ViVk . 

«•-1 7-1 *-l 


This is an identity for any t/ t -, hence we must have 

w 

J2 a a a ik = 1 if j = and 

»-i 

= 0 if j k. 

We have thus shown that the matrix (an) is orthogonal and it follows that 

n 

J2 a,i a*, = 1 if j = k and 


«-l 

= 0 if j k. 

6. In practical applications the Xik will be given and if the expression (7) or 
(8) is to be written down we must first solve the set of equations 


We may assume that 


n 


Z 

i-i 


Xu = 


0, 


|Xll • • • X .1 


*0. 


(1-1,2, •••,«). 


I Xu * * • 
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There exist, of course, an infinity of solutions. A very simple one can be found 
if the matrix (#<*) is completed into a square matrix by adding 1 in the diagonal 
places and 0 elsewhere. We obtain 


£11 * 

* £*1 

£*+ii 

* * £n« 

Xu • 

• £« 

£«+i« 

* * £»* 

0 . 

• • 0 

1 

•* 0 

0 • 

• 0 

0 

• • 1 


The minors of the terms of any of the 8 + 1th, • • • nth line give one of n — s 
independent sets of solutions for the z% u . 

If, e.g. 8 = 1, then the z lu are 

—Xu Xu 0 0 • • • 

— £31 0 Xu 0 • • • 

-*41 0 0 Xu • • • 


and the Z are 


etc. 


2 , 2 

£11 ~r £21 , 

£ 21 X 31 j 

X21X41 , 


•^21^81 , 

£ll + £31 , 
£3l£41 > 


•^21^41 

£81X41 
2 , 2 
Xll *T £41 


etc. 


Hence, for $ = 1, n = 2, 

» 1 . ^ 1 
lAmin = £ V\ ~ x * n + ^ (*U»l + ^.2/ 2 ) 2 = X T + ^ ( - *212/1 + *l,2/2) 2 


and for s = 1, n = 3 

.2 2 (£u2/i + £ 212/2 4* £8iJ/a) 2 

Ymin = Z^ 2 /» ~~ „2 , 2 , 2 

*-i £11 + £21 + £si 


£11 + £21 £ 2 i 2 /i + £11^2 


1 , , 2 , £*i£8i ~ £8i2/i + £ii2/8 

( _ Xnyi + Xn y 2 ) + Z^T^jr — — 

£11 -T £21 ^ ^,2^ £11 T" £21 £21 £31 


£21 £41 £11 + £81 






1 1 1 1 •••• 2 1 
1 1 1 1 •••• 1 2 


and 



The sum of squares into which if&in can be transformed is then found to be 

Jj (- yi + ytf + gTg (- yi - y* + 2*/ s ) ! 

+ 374 (- yi — y* — vs + 3 v*f + • • ••* 


1 This is the result contained in a paper by J. 0. Irwin, “Independence of the constit¬ 
uent items in the analysis of variance” Suppl. Roy. Stat . Soc. Jour. Vol. 1 (1934). 



NOTES 

This section is devoted to brief research and expository articles, notes on 
methodology and other short items. 


ON THE ANALYSIS OF A CERTAIN SIX-BY-SIX FOUR-GROUP 
LATTICE DESIGN USING THE RECOVERY OF 
INTER-BLOCK INFORMATION 

By Boyd Harshbarger 1 

Virginia Agricultural Experiment Station 

1. Introduction. A detailed description for a six-by-six four-group lattice 

design is given in a recent article [1] by the author, and the analysis is developed 
which uses only the intra-block information to correct the varieties for the block 
effects. Here is developed the analysis that makes use of both the intra- and the 
inter-block information. ' 

Referring to Group X on page 307, fl], since block (1) contains varieties 1 to 6, 
and block (2) contains varieties 7 to 12, the difference between the means of 
these two blocks is also an estimate of the difference between the first six varieties 
and the second six varieties. The information obtained from such inter-block 
comparisons was ignored in the previous analysis. In attempting to use this 
information, the chief difficulty is to decide how estimates derived from the 
comparison of block totals shall be combined with the previous estimates. 
Since each block consists of six plots, comparisons between block totals may be 
expected to have a higher error variance^than the within-block comparisons, 
just as in split-plot designs the main block comparisons usually have a higher 
error than the sub-plot comparisons. The problem is, therefore, to estimate 
the relative error variances of the inter- and intra-block comparisons, and then 
to combine the two types of estimates to the best advantage. 

2. Calculations of the adjusted varietal totals.' In addition to the equations 
(7), [1], which contain all the intra-block information, we now have the additional 
set of equations, 

Bi == 6/i + (sum varietal constants in this block) + e,, which are estimated 

by 

Bi = 6m + Set,, + Ei . 

In these equations and all the following equations, the double prime symbol 
(") used in [1] is omitted, but the statistics have the same meaning as in equations 
(7), [1] except in this paper they are adjusted by both inter- and intra-block 
information. 

1 The author wishes to express his appreciation to W. G. Cochran of Iowa State College, 
who advised in the preparation of this analysis. 
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The general problem is to minimize the function, 


F = WSiya - m - vj - b,f + - 6m - Zv bi ) 2 

' o 

86 1 1 k j 

subject to the restriction Vj = 0 and 2 £ = 0, and where IF = -rand 

/■•l e^x i«l d 

w = \. 

<*b 

Following the method given in [1] the typical block equations for b xl • • • b*« is 
bxl = 6 3 W + T W (4Bl1 “ Tn) = 6 3FT1P' Cxl 


and for b ,\ • • • b« 6 is 


- m\winFmr+W) I(25r2 + 22 w ' + Tnc “ 

+ (W - W')\C ,3 + C*)] + (C, 2 + Cm + <?..)}• 

It can be seen that for W f = 0, b xi and b xl are the intra-block values given in 
[1] and for W' = W they are the randomized block values. 

A typical adjustment varietal total then becomes 

4^1 + 4 m = V\ — (bxi + by i + b ti + b u 2 ). 

3. Estimation of W and W'. Following the method presented by Cochran [6] 
and Yates [3], the error of a block total may be written as 

Ei = Gil + C»2 + ’ • * + C,’8 + bb? 

where 

V(e) = <x 2 and V(b c ) = al 

Hence V () = Go* 2 -+- 36 al and component (a) is thus an estimate of a 2 + 6a&. 
One finds from evaluating the expected value of (15), [1] corrected for replicates, 


<«-¥> 


that the expected value of component (b) is a 2 + J-6 al 


In the analysis of variance if components (a) and (b) are pooled, one obtains the 
block variance B as an estimate of a 2 + | • 6<r 2 . Since the intra-block variance 
is an estimate of <r 2 the estimates of the true variance between* blocks, tr 2 + 6tr 2 , 
. 8 B-E 1 

18 7 W' * 

4. Standard error of adjusted varietal means. The standard error of the 
difference between the adjusted means of two varieties which appear together in 
the same blocks in groups Z or U, is 

1 r,, , *w 1 


kW [ (fc 


3TF + W\ 
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obtained by the method outlined by Cochran. Similarly, for the case in which 
the varieties are together in the same block in groups Z or 17. 

When an attempt is made to express the difference between these two adjusted 
varieties which appear together in the same block in groups X or Y in terms of 
the levels of the main effects and interactions, the interactions are no longer 
unconfounded and the method employed above breaks down.. 

If one is willing to assume that the formula for the variance of the difference 
between two adjusted varietal means for varieties which appear together in the 

I f BW \ 

same block in the groups X or Y is of the form — ^ l A + 

constants may be determined by the values already known, [1]. This form can 
be shown to be that for a quadruple lattice. 

1 / BW \ 

The formula ( A + W f ) must rec ^ uce va -i ue f° r intra-block 

analysis [1] when W ' — 0, and when W = W' to the value for complete random¬ 
ized blocks. When these conditions are imposed, the formula becomes 

1 (\<\ l S0W \ 

144 W \ ^ 3W + W'J * 

This value is slightly larger than the value obtained when the adjusted varieties 
appear together in the same block in groups Z or U, as should be the case. This 
gives us a lower limit. One can arrive at the upper limit in the following manner: 
suppose the variance (intra)i obtained in the intra-block analysis for the difference 
between two varietal means such as Vi and v* is greater than that for varietal 
means v* and v\ (intra) 2 , then it follows that: 


(inter + intra) t g (inter + intra) 2 X 


(intra)i 


. . 1 ~ (intra)* 

Using this relation, the upper limit for two varieties together in the same block 
in groups X or Y is 


A 64 
^'/ 63 ’ 


24W \ 1 3 W + W'J 63 ’ 

which gives a value slightly greater than the formula derived, as it should if it 
is to be the upper limit. In a similar manner one gets the variance for the differ¬ 
ence between varietal means not appearing together in the same block. 

5. Efficiency of the design to the randomized complete blocks. By the 
method outlined by Cochran [6] the efficiency can l>e shown to be measured by 
the ratio of 
Jfc . 1 


W + W‘ 


to 4 (average error variance of the difference between two plots). 


^ | | LU X VAMgV ViiOl T V/l UMV VUAi V* V**VV VV** V 

It will be noted, by using the above formula, that the gain in efficiency for 
the numerical problem given in [1] is 1.003, which for our purpose here is zero. 
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This, in general, will not be the case, for on most soils there is a block difference. 
In this particular test the ground used had been previously filled in with well 
mixed soil. The efficiency for the analysis given in [1] relative to the randomized 
complete blocks was less than 1.00. 

This paper and the previous one show what a long tedious procedure is neces¬ 
sary to analyze the data, when the design does not follow the rules for the 
construction of the lattice, triple lattice, etc. The complexity of these methods 
stresses the importance, to those designing experiments, of not deviating from 
the established design if the most information is to be secured from the data with 
simple calculations. 
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FURTHER REMARKS ON LINKAGE THEORY IN 
MENDELIAN HEREDITY 

By Hilda Geiringer 
Wheaton College 

In the following an explicit formula for the distribution of genotypes in case of 
three Mendelian characters will be given [formula (5)]. The complete discussion 
of the case m = 3 suggests a supplement (as stated in the last paragraph of this 
paper) to the general limit theorem dealing with m characters. 

In an earlier paper 1 recurrence formulae have been derived which furnish the 
distribution of genotypes in the nth generation if the distribution in the (n — l)th 
generation and the “linkage distribution” (l.d.) are known. It was also 
shown how to “integrate” this system of difference equations so as to determine 
the distribution in the nth generation directly from that in the 0th generation. 
This last method, though straightforward, requires however in each particular 
case quite a few operations. 

In case m, the number of Mendelian characters, equals two, an explicit 
formula for the problem in question had been known. Denote by p(x x , x 2 ), 

1 Hilda Geiringer, Annals of Math. Stat . Vol. 15 (1944), pp.25-57. The notation 
in the present Note will be the same as in this paper. 
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m 

(z \, X 2 * 1,2, • • • Aj), the “distribution of transmitted genes” in the original, Oth, 
generation, by p <n) (xi, Xt) that in the nth generation and by c the “crossover 
probability” (c.p.). Then the simple formula holds : 3 

(1) p ln \x 1 , xj) = (1 - c)"p(xi, Xj) + [1 - (1 - c)"]pi(xi)pi(xj). 

This may also be written: 

(10 p ( " ) (Xl , Xi) = Pi(x l )p 2 ( 2 j) + (1 - c)*[p(xi , Xj) - Pl(xi)ps(x*)], 

where p,(x<) are the marginal distributions derived from p(xi, xa). ( 1 ') shows 
that, if in case of independence of the original distribution, p(x x , x 2 ) = pi(xi)pa(xa) 
then p- n) (xi, Xs) = p(xi, Xj) for every n. The same is true for arbitrary p(xi, Xs) 
if c = 0 . Otherwise, if c > 0 the second term to the right in ( 1 # ) tends towards 
zero asn —» oo and the well known limit theorem results. 

In case m = 3, a remarkably elegant explicit formula exists* which may be 
deduced from the author’s general theory. In this case the l.d. is completely 
equivalent to the three c.p.’s Cn , c& , . The c*y are probabilities with sum ^ 

2 , and for which the triangular relation 

(2) Cij + Cjk Cik 

holds. If l(e i, € 2 , ea) = 0 ,1) denotes the eight values of the l.d. we have (see 
quot. [ 1 ], p. 32) 1(000) = Z(lll), Z(100) = Z( 011 ), Z(010) * Z( 101 ), Z( 001 ) - Z(110), 

hence three independent values only. We may introduce 

» 

, . 2Z(000) = v(000) = Vo, 2Z(I00) = v(100) am Vl , 2Z(010) = v(010) ss v 2 
^ ^ 2Z(001) = v(001) 3 ti 3 ; v 0 + vi + v 2 + v 3 — 1. 

It follows easily that 

(4) cn = Vi + Vj, (i * j, i,j = 1 , 2 , 3). 

The original distribution p(x i, x* , x 3 ) has marginal distributions p*y(x< , Xy), 
Pi(xi ). These values will be denoted briefly by p m , pia , Pm , Pis, Pi , Pa, p* 
respectively. Writing in an analogous way p (n) (X 1 X 2 X 3 ) = pift the new formula is 
the following: 

... Pm = PiPsPs + [(i>o + t»i) n - vi “KpiPm - PiPsPj) + [(»o + t> 2 ) n - ](P»Pu 

( 5 ) 

~ P 1 P 2 P 3 ). 4 - [(Vo 4- Vs)" — Vo n ](p 8 Pi 2 - P 1 P 2 P 3 ) 4* Vo(p m - PiPiPa). 

This useful formula permits to compute readily pm for every n. In terms of the 
c ( j , writing 

(6) dij = 1 — C*y, Vo = 1 — i(ci 2 4* C 28 4- Cis), 
it reads 

(5') Pm = piPsps 4* (d%z — Vo)(pip23 — P 1 P 2 P 3 ) H—h • Vo (pm ~ P 1 P 2 P 3 ). 

*H. S. Jennings, Genetics , Vol. 12 (1917) pp. 97-154. 

* Professor Felix Bernstein called this author’s attention to the biologically interesting 
case m « 3. 
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In these formulae the role of independence of the original distribution is clearly 
seen: If Pa * p&j and pi 28 = PiPtP* then pin = pm for every n and every LcL 
The same holds for every n and every pm if vq = 1, which implies that all ca be 
zero. If in (5') all d {j < 1, hence all c»y > 0 the limit theorem lim pin = 

n—*oo 

P 1 P 2 P 3 results. ~ ca > 0 means that complete linkage between any two genes is 
excluded. If, on the other hand, e.g. t>o > 0, t>i > 0, v 0 + v x = dn = 1, Cm = 0, 
hence v 0 < 1, v 2 = v s = 0 we get pin —* P1P23 . If Cm = C 12 = 0 the triangular 
relation (2) shows that cn = 0 too, a case considered above. 

It should be noticed that (5) is, of course, in agreement with the author’s 
equation (41) in quot. [1]. It only has to be observed,—an obvious fact not 
mentioned in my earlier paper,—that in the former setup the sum of all the a (n) 
for every fixed m equals one. Thus for m = 3: 

(7) <*123 + + <* 2,13 + <* 3,12 + <* 1 , 2,3 = l, (for every n), 

and 

( 8 ) <*123 = Vo, <*i,23 = (^0 + ^l)” Vo = d £;3 ~ Vq. 

<* 2,13 = (V 0 + Vz) n — Vo = d\z ~ Vo. 

<*3J2 == (VQ + Vz) — Vo ~ di2 ““ Vq. 

The preceeding complete discussion of the case m = 3 suggests a remark 
concerning the general case of m characters. In my earlier paper the influence 
on the main limit theorem of certain ways of degeneration of the l.d. had not been 
explicitly considered. In the following we shall use the ^-distribution which 
is a little shorter to write than the l.d. Z(«i, £2 , * * * «»»). The ^-distribution con¬ 
tains only 2 m ~ 1 values with sum one, defined in a way similar to (3). The main 
limit theorem ([1], theorem II, p. 42) states in our present notation that 

(9) lim pi*?..* == pip«- ■ -p m , 

n-*oo 

if “complete linkage” between any group of genes is excluded. That implies 
that not only v 0 ss t>(0,0, • • • 0) = 1 must be excluded but even •••()) = 

1 , where this last probability denotes a marginal distribution of the ^-distribution 
of an order ^2. To assure this it is necessary and sufficient that no 0 , 7 ( 0 ,0) = 1, 
ornod,iS 2 Ui/ 0 , 0 ) = 1 , or no c, , = 0. Hence (9) holds if and only if no c t J = 0. 
If this condition is not satisfied the l.d. degenerates in various ways and the limit 
theorem is to be modified accordingly. If, in particular, v Q — 1, all c,j = 0 , and 
Pi2 n> . -m = Pi 2 ...mfor every n. 

Between these two extreme cases (“no c,, = 0”, “all c;, — 0”) are the different 
possibilities of r < m groups of completely linked-characters (see [1] p. 36, iv)). 
Consider e.g. m = 7 and ^^(OOOO) = 1, ^(OOO) = 1 (this is realized if 
0 ( 0000000 ) > 0, t>(0000111) > 0 with sum of these two numbers equal to one) then 
lim pi??. .7 = P 1234 p& 87 . Here the four characters 1 ,2 ,3,4 act as one character and 

»—*oo 

pint = Pish for every n. Also ps,V = Pu 7 . Or if, for m = 6, du = d M = d u = 1 
(realized if y(OOOOOO) > 0, y(110000) > 0, t>(001100) > 0, y(000011) > 0, with 
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the ?um of these four values equal to one) then pii -e —► Pv&uP* . If however 
for m as 6 merely dn ~ du ~ 1 (realized if, in a notation analogous to (3), t?o , v §, 
v 9 , vu , v J2 , Vu , Vi 2 B, vm are the only non-zero values of the l.d.) then piSl .«^ 
PnPuPbPt . 

In general, with a proof which consists in a modification of the reasoning (p. 
41), of my earlier paper, we may state the following complement to the main 
limit theorem (9) : If the l.d. is such that r < m disjoint groups Q\ , (?2 , • • • <? r 
of completely linked characters exist , i.e. such that within each group no crossover 
takes place , each group containing as many of the m numbers as compatible with the 
definition but not less than two, and all groups together containing s 2* m of the m 
elements , then , as n —* qo, p}??. , m converges towards the product of those marginal 
distribulions (of the original generation) which correspond to these groups multiplied 
by the marginal distributions of order one of the remaining free elements which are not 
contained in any such group. In a formula: 

(10) lim Po x .0 2 . “*0 r 'Y« + l*7« + ii' • • Tm = Poi P<? 2 * "P<*t PTi + iPt. + J* "Plm. * 

»-* oo 

We may also characterize these linked groups of maximum size by stating that 
while within each group no crossover takes place there must be at least one c.p. ^ 
0 among any two such groups and at least one among any group and any free 
element. It may however be noted that if there is one c.p. > 0 among two 
groups of complete linkage (or among a group and a free element) then all c.p.’s 
among these two groups are different from zero. In fact, it follows by repeated 
use of the triangular relation (2) that if one c.p. among two disjoint groups of 
complete linkage is zero, all of them are zero. If, e.g., (1,2,3) and (5,6,8) are two 
groups of complete linkage, i.e. v^COOO) = 1 and ^(OOO) = 1 and if besides 
Ci 5 = 0, then v m m (000000) = 1 and these six elements form a group of complete 
linkage. 

It may be noticed that the above statement of the generalized limit theorem 
becomes simpler and more elegant by counting “free elements” as groups. It 
might then run as follows: If G \, <jr 2 , • • * G t (t S m) are the maximal groups of 
completely linked characters, then , under the hypotheses of the earlier paper , the gene 
distribution in successive generations approaches a limit in which the original (mar¬ 
ginal) probabilities within each group Gi are preserved and genes and sets of genes 
fromd ifferent groups are independently distributed. 


ON THE DEFINITION OF DISTANCE IN THE THEORY OF THE GENE 

By Hilda Geiringer 
Wheaton College 

In several letters to this author Dr. I. M. H. Etherington of the University of 
Edinburgh has raised questions concerning the author’s definition of “distance” 
proposed in Section 10 of her paper on Mendelian heredity, 1 comparing it with 


i Annals of Math. Stat ., Vol. 15 (1944), pp. 25-57. 
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the definition implicit in Professor J. B. S. Haldane’s earlier treatment.* The 
main content of the author’s paper consists of some general limit theorems and 
the integration of a certain system of difference equations. The distance defini¬ 
tion is a by-product subject to discussion. 

“Distance” dij between two genes i and j is defined by the author as the 
mathematical expectation of the number of crossovers in the interval (i f j) with 
respect to the “linkage distribution” (l.d.). This basic concept is introduced 
as follows (page 32): If S is the set of numbers 1, 2, * • • m (m being the number 
of Mendelian characters), A any subset of A and A* = S — -4, we denote by 
1(A) the probability that an individual with “maternal” genes X\ , • • • ,x m 
and paternal genes yi , •••,2/m transmit the paternal genes belonging to A and the 
maternal genes belonging to A'. These 2 m probabilities constitute the l.d. 
From these definitions the equality (G. (530) 

(1) dij = d ,{+1 + Cj+i,i+2 + * * • + (t < j) 

is derived, where is the probability of a “crossover” (c.p.) in (i, j) . This 
distance has the required additivity: (G. (54)) 

(2) dij + d jk = da , (i <j<k). 

Etherington points out that the term “distance” has an established currency 
in genetics being the basis on which chromosome maps are constructed, and 
that there is a standard method of calculating it in accordance with which (1) 
is an “approximation valid only when the adjacent c.p.’s are small.” Moreover 
“the biological uniqueness has been lost for the value of dij now depends on the 
particular set of intermediate genes which we happen to be considering. If any 
of them are omitted from consideration then the inequality (G. (13)). 

(3) Cij + Cjk Ca 

shows that in general dij is diminished while if new genes are taken into con¬ 
sideration dij may increase.” “In order that dij should not depend on a particu¬ 
lar choice of intermediate genes the word ‘crossover’ in the definition given would 
have to be interpreted as ‘chiasma’ instead of ‘odd number of chiasmata’; <and 
then dij cannot be evaluated in terms of the l.d. alone without further assump¬ 
tions regarding the interference of crossovers.” 

The point of view adopted in the author’s paper was to regard the l.d. as the 
basis from which everything else has to be inferred. The number m of Men¬ 
delian characters is considered constant and the distance, being a mathematical 
expectation with respect to the l.d. necessarily depends on it. In this conception 
distance is not a geometric property which can be measured for any two genes 
independently but rather a system of m(m — l)/2 consistent numbers associated 
to the m genes. There is no choice regarding the intermediate genes to be taken 
into consideration; all known genes are to be considered, i.e. one has to use the 
available relevant information in order to determine the l.d., the c.p.’s and the 

* Quotation [4a] in the author’s paper. References to these papers will be distinguished 
by the initials H and O. 
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distances. If the information is incomplete the results will be provisional and 
subject to change; if it is satisfactory the same will be true for the distance?. 
Thus it is nothing but natural that dij is changed if some genes are omitted from 
consideration, or if new genes are discovered. In this set up “crossover”— 
defined by means of the marginal distributions of second order of the l.d.—means 
a transition from the paternal to the maternal set or vice versa. (Expressed 
in terms of the chiasma-hypothesis this means “odd number of chiasmata 
between adjacent genes.”) Additional assumptions “regarding the interference 
of crossovers” are neither necessary nor admissible. All this is contained in the 
l.d. 

Haldane’s approach as translated by Etherington into the author’s notation 
is as follows. “The genes are considered to be distributed continuously along a 
chromosome. Thus this approach unlike G.’s is not based on the l.d. of a 
finite set of genes. We must think of one suffix, i, as referring to a gene at a 
fixed locus on the chromosome, the others to variable loci, so that the c.p.’s 
are variable. For any three genes f, j, k a quantity p is defined by the equation 

(4) dk = Cij + Cjk — pcijCjk , (i < j < k)' 

Biological considerations show that p is a number between 0 and 2 (small when 
dj and are both small, increasing, on the whole, with c tJ -f- c jk ). The distance 
Da is defined by the statement 


(5) Dkj/ckj —> 1 as fc approaches^' (c k j •—► 0), 

together with the additive property, and from this with (4) Haldane’s general 
distance expression is derived: 


( 6 ) 


Ay 



~ PoCij 


Here po = Po(c.y) denotes the limiting form of p when k approaches j, and repre¬ 
sents biologically a property of the chromosome segment ( i,j ), a measure of 
interference. Any suitable specification of this^function po(c*y) would constitute 
a mathematical ‘model’ of the chromosome. If p were constant we should 
have po = p and 


(7) D iS « - \ log (1 - pc.y). 

V 

Both Haldane and Geiringer considered the special cases p = 2 (no interference) 
and p = 0 (complete interference) for which respectively 


( 70 . 


Ay * ~ i log (1 - i Cij) 


(7“) Dij — cn = dij. 

Since p is always between 0 and 2 Haldane concludes that the true value of Dij 
is between (7 ; ) and (7“), and he gives reasons for saying that (7') is nearly correct 
for genes ‘far apart,’ (7") for genes ‘close together.’ ” 
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If the author is right, this seems to be the standard definition accepted in 
genetics as mentioned above by Etherington. A few, not exhaustive, comments 
may be added. Writing in (6) t for the variable of integration and po = po(t) 
it is seen that the expression 


( 6 ) 


D<t 


r eii dt 

'o 1 - tpo(t) 


contains the unknown function po(t), which is unspecified except for the state¬ 
ment that it is bounded between 0 and 2. It is immediately seen that with an 
arbitrary p 0 (t) and without a restriction taking the place of (4) this distance (6) 
will not be additive in the sense of (2). By imposing, after a choice of po(0> 
appropriate restrictions on the a, additivity may be achieved. For instance in 
the particular case po(t) = p = const, (2) holds by virtue of (4). For such a set 
of restrictions it has then to be proved that the corresponding “model” is “con¬ 
sistent,” i.e. that the so restricted c.p.’s form a compatible set of marginal 
distributions of second order of an ra-variate distribution, the l.d. 

These different points will be exemplified presently by studying the particular 
case po(t) = p, where p is a suitably chosen constant; the parameter p is to be 
fitted to the observations under consideration. It may be impossible to repro¬ 
duce a set of observations satisfactorily if one parameter only is available. In 
fact, Haldane’s paper suggests that it is not only the particular case p = const 
he has in mind. It seems however that if Da is given by ((>) with a non constant 
Po{i), complicated and perhaps (biologically) not very meaningful conditions may 
have to be introduced in order to assure additivity of the distances and con¬ 
sistency of the respective model. This author was unable to work out examples 
of more general and at the same time appropriate and fairly simple assumptions 
for the unknown function p o (0- 

If p ~ const, then (7) under the restriction (4) furnishes an additive distance 
definition because: 


- p[Dij + Djk) = log (1 - pdj) -f log (1 - pc ik ) 

= log (1 - pcij - pcj k + p 2 CijC jk ) = log (1 - pen) = — pDik, 

because of (4). Let us now investigate whether there is a consistent system of 
c.p.’s satisfying (4). Put, as in G.(48), c iti +1 = p, , combine (4) withG.(50) and 
write p = 2c. It follows that (4) is satisfied with 0 ^ e g 1, if: 

(8) Va = vpiVi , P»;a = *PWiVk , * • * . 

Here Pij is the probability of the simultaneous occurrence of the “events” 
numbered i and j, etc. For € = O we get “disjoint events” (see Gi) for the 
discussion of consistency). Assume now c > 0. By some considerations, 
analogous to those p. 54 G, the following necessary and sufficient condition of 
consistency follows: 

ml 

n (l - tp t ) s i - 

*-l 


( 9 ) 


€ 


(« > 0 ). 
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This restriction (not considered by Haldane or Etherington) is, of course, 
relevant. If e.g. m = 3, pi — p 2 = 4/5, then € must be 15/16; or if m * 4, 
Pi *= Jh = Pi = 3 — \/5 results. The restriction required by the “linear 

theory” is 

(10) p, ^ i , (i - 1, 2, • • • , m - 1). 

Hence this model is consistent under certain restrictions. It is, in contrast 
to Etherington’s contention, different from iii) G. p. 54. The corresponding 
distance definition (7) is different from the author’s. The D„ thus defined are 
additive, and Da depends on a§ only and not on the intermediate genes. The 
author’s definition of distances, d{j , is general, additive and seems to the author 
to be well adapted to the biological situation ; since the definition of da is not 
related to any particular model it is compatible with any model, which may 
contain any desired—consistent—assumptions about “interference,” etc. For 
example in G. iv) p. 55, an n-parametric model has been suggested which seems 
fairly flexible. 

It may however seem more acceptable to the biologist not to use a general 
distance definition but to define “distance” merely in relation to some sufficiently 
general “model” (such that the distance definition would vary with the model), 
instead of accepting an all-over definition as ventured in the author’s paper. 
The particular model (8) in connection with its related distance definition (7) 
might give an example of such an approach. 8,4 

8 As Etherington remarks, eq. (14') in the author’s original paper is not correct. One 
can only state that (47) holds. The mistake is however without consequence since no 
conclusions are drawn from (14'). The same mistake was pointed out by Professor Kai 
Lai Chung. 

4 Etherington writes: “I have been kindly allowed to read Professor Geiringer’s MS. 
and feel that some comments are necessary. 

The standard procedure for calculating the distance between tw r o linked genes is as 
follows. A selection of intermediate genes is taken and the adjacent crossover values 
calculated, giving a provisional estimate of the distance as in Geiringer’s formula (1). 
When further intermediate genes are added to the selection, it is found that the provisional 
distance increases, but there is apparently a maximum value beyond which it cannot be 
increased. This unknown maximum value is the distance, and the geneticist accepts (1) 
as the distance when he is sure that he has observed a sufficient number of intermediate 
genes to give a good enough approximation to the true distance. Thus Geiringer's formula 
(1) gives the geneticist’s true distance only on the understanding that it includes all genes 
intermediate between i and j; but generally speaking the great majority of these genes 
may be unobservable in the sense that they have no observably distinct alleles by means 
of w'hich the c.p.’s could be calculated, though from time to time fresh genes may become 
observable by mutation. 

In some cases the above procedure fails because not enough intermediate genes can be 
observed; then Haldane’s analysis is useful. It should be emphasized that his distance is 
additive by definition. (For a geometrical analogy, think of the genes as points closely 
distributed along a curve, chords representing c.p.’s. Haldane's definition of the distance 
is analogous to defining arc length of the curve as a limiting sum of chords.) In my tran- 
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scription of his treatment, I should perhaps have made it clearer that the derived formula 
(6) gives only the distance Da measured from the initially chosen and fixed gene i to an 
arbitrary gene j. Other distances Djk , (t < j < k), are deduced from it by the postulate 
of additivity (!>,* - Da — Du), If the origin i is changed, there will be a similar formula 
(6), but it should not be assumed that the function po is the same. In referring to certain 
conditions necessary ‘to assure additivity/ Geiringer evidently means conditions that the 
function p 0 may be the same for all origins t. These conditions would be interpreted bio¬ 
logically as asserting uniformity of interference along the chromosome. I agree that there 
are further points to be cleared up in this connection. 

If I might sum up the discussion, I would say that the geneticist's conception of the 
distance between genes is an actual property of the corresponding chromosome segment. 
Geiringer’s definition represents the best possible general approach to this from the limited 
data of the l.d. alone. Haldane's definition fits the geneticist's conception, and his in¬ 
vestigation is an attempt to get the best estimate of the distance by making approximate 
assumptions as to what happens between the observed genes. It is based on the unob¬ 
servable crossover-distribution of a supposed infinite set of genes, but can be applied to 
particular models of this infinite c.d. so as to derive results which involve only a finite and 
observable c.d. Finally it should be mentioned that in the paper quoted, Haldane gave 
also an alternative method for the case p « 2, leading to the same formula (7')> which is 
really equivalent to defining the distance as the mathematical expectation of the number of 
chiasmata (not crossovers in G.'s sense) in the interval (t, j), 99 


A CRITERION OF CONVERGENCE FOR THE CLASSICAL ITERATIVE 
METHOD OF SOLVING LINEAR SIMULTANEOUS EQUATIONS 

By Clifford E. Berry 

Consolidated Engineering Corporation , Pasadena , Calif. 

The recent development of two devices 1,2 for solving linear simultaneous 
equations by means of the classical iterative method 3 has stimulated the writer 
to investigate convergence criteria for the method. There are in the literature 4 
necessary and sufficient criteria for convergence of symmetric systems, and suf¬ 
ficiency criteria for general systems. So far as the writer knows, however, this 
is the first development of a necessary and sufficient criterion for convergence 
in the general case. The results obtained are applicable to any arbitrary square 
non-singular matrix in which an 0. 

Let the set of equations be represented by 

(1) AX = Gy 

1 Morgan, T. D., Crawford, F. W., “Time-saving computing instruments designed 
for spectroscopic analysis", The Oil and Gas Journal , August 26 (1944), pp. 100-105. 

* Berry, C. E., Wilcox, D. E., Rock, S. M., Washburn, H. W., “A computer for solv¬ 
ing linear simultaneous equations", to be published. 

3 Hotelling, Harold, “Some new methods in matrix calculation", The Annals of Math¬ 
ematical StatisticSj Vol. XIV (1943), pp. 1-34. 

4 Mises, R. von and Pollaczek-Geiringer, Hilda, “Zusammenfassende Berichte. Prak- 
tische ,Verfahren der Gleichungsauflosung". Zeitschrift fUr angewandte Math, und Me- 
chanik, Vol. 9 (1929), pp. 58-77, and 152-164. 
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in which A is the square matrix of the coefficients, X is the column matrix Of the 
unknowns, and G is the column matrix of the constant terms. | A | is the de¬ 
terminant of A. 

We define a matrix Ai which contains the prediagonal and diagonal terms of A, 
and a matrix which contains the postdiagonal terms of A. According to this 
definition, 

(2) Ai + Aa = A, 

In the classical iterative method, arbitrary (or approximate) values of the x’a 
are chosen, the first equation is solved for the first unknown, the second equation 
for the second unknown, etc., using in each equation the most recent approximar 
tions to the x’a. This process may be written 

(3) AxX (l) + A t X m = G, 

in which X <0) is the initial approximation matrix, and X H> is the approximation 
matrix existing at the end of the first iterative cycle. The superscripts indicate 
the number of the approximation. The next cycle is described by 

(4) AiX® + AjX H) = G, 
and the rath by 

(5) A,X t " ,) + A 2 X tm-1> = G. 

The method yields a solution, i.e., converges, if 

lim (X (m) - X) = 0. 

Solving (5) explicitly for X ( 

(6) X (m) = AT l G - Ar I A 2 X tm ~ 1) . 

Subtracting X from each side, 

(7) X <m) - X = AT'G - AT l A 2 X im ~ l) - X, 
and making use of (1) and (2) 

(8) X (m) - X = -Ar'AsCX^- 11 - X). 

Since (8) applies for any value of m, we may write 

(9) x' m) - X = (-Ar l A 2 ) s (X c ” ,_2) - X), 
and continuing this process, 

(10) X (m) - X = (-Ar 1 A 2 )" , (X <w - X). 

Now, lim (X (m> - X) = 0 if and only if 


(11) 


lim (—Ar 1 A J )"‘ - 0. 
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This, is a general result, applicable to any arrangement of the terms of an ar¬ 
bitrary square matrix A, subject only to the conditions that | A | ^ 0 and that 
no diagonal term of A is zero. In this latter exceptional case, the iterative 
method itself obviously cannot be applied. 

The criterion (11) clearly shows that the order in which the elements of the 
matrix A are arranged is important. For instance, it is plain that an arrange¬ 
ment in which the diagonal terms are large and the off-diagonal terms, particu¬ 
larly the post-diagonal terms, are small will tend to favor convergence. 

A somewhat relaxed condition, which is sufficient but not necessary, is ob¬ 
tained through the use of an inequality used by Hotelling 3 , namely, 

(12) N(B m ) < [N(B)] m , 

in which N(B) is the norm of the matrix B, that is, the square root of the sum 
of the products of its elements by their complex conjugates, or in the case of a 
real matrix the square root of the sum of the squares of the elements. 

The condition is that, if 

(13) NiAT'A*) < 1, 
then 

(14) lim (Ai l A 2 ) m = 0. 

m-* oo 

Criterion (13) is readily computed, since Ar\ the reciprocal of a triangular 
matrix is readily computed, and the post-multiplication by At. involves a number 
of zero terms. 

A more stringent condition than (13) though still not a necessary condition, 
is that if some finite number p can be found such that 

(15) < 1, 

then (14) follows. Since n matrix squarings result in a value of p = 2 n , the size 
of the norm for fairly large values of p can be investigated without excessive 
labor. 


A REMARK ON INDEPENDENCE OF LINEAR AND QUADRATIC 
FORMS INVOLVING INDEPENDENT GAUSSIAN VARIABLES 

By M. Kao 
Cornell University 

The purpose of this note is to call attention to the following useful theorem, 
which to the best of my knowledge was never stated explicitly. 

If Xi, X 2 , Xz, • • • X n are identically distributed , independent Gaussian random 
variables each having mean 0, then the necessary and sufficient condition that 

» n 

yi ajkXjXk and a/X/ ® a*X 

?.*-! j-1 
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be independent, is that 

Aa = 0, 

where A ie the matrix of the quadratic form, a the vector (oj, at, ••• , a«) and X the 
vector {Xi, Xj, • • • X n ). 

Proof of sufficiency. 1 Since Aa = 0, it follows that 0 is an eigenvalue of A, 
and a is a corresponding eigenvector. 

Denoting by X 2 ,• ■ • , K the remaining eigenvalues and by &,•••, /3 n the 
corresponding eigenvectors, we have 

t a ik X s X h = ± \tfj-Xf. 

3-2 

Since the 0’s are orthogonal to a, it follows that the linear combinations P,-X 
are independent of a-X, and this completes the proof. 

Proof of necessity. From the assumption of independence it follows that 

n / » \ 2 n 

£ djkXjXk and (£ a;X/) * ]£) ctjOt k XjXk 

j.*-i \,-i / y.fc-i 

are independent. Thus by Craig's theorem 2 

A# = 0 

where# = (( ajak )). 

This implies almost immediately that A a = 0. 

1 Added tw p/oo/: Dr. L. Guttman has kindly pointed out to me that the proof of 
sufficiency given here has been used by D. Jackson in the article “Mathematical principles 
in the theory of small samples”, Amer. Math. Month., Vol. 42 (1935), pp. 344-364, see in 
particular pp. 354-355. Jackson considers only the independence of $ and which is of 
crucial importance in deriving student’s distribution. 

2 A.T. Craig, Annals of Math. Stat. f Vol. 14 (1943), pp. 195-197; see also H. Hotel¬ 
ling, ibid., Vol. 15 (1944), pp. 427-429. 
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Presented on September 16, 1945 at the Rutgers meeting of the Institute 

1. On The Variance of a Random Set in n Dimensions. Herbert Robbins, 
Lieutenant USNR Postgraduate School, Annapolis, Md. 

Using a general formula for the moments of the measure of a random set X (Ann. Math. 
Stat. Vol. XV (1944), pp. 70-74) we find the mean and variance in the case where X is a 
random sum of n-dimensional intervals with sides parallel to the coordinate axes, thus gen¬ 
eralizing the results previously found (loc. cit.) for the case n ■» 1. 

2. The Non-Central Wishart Distribution and its Application to Problems in 
Multivariate Statistics. T. W. Anderson, Princeton University. 

The non-central Wishart distribution is the joint distribution of sums of squares and 
cross-products of deviations of observations from multivariate normal distributions with 
identical variance-covariance matrices and with different sets of means. The rank of the 
non-central Wishart distribution is defined as the rank of the matrix of Bets of means. In a 
previous paper (by M. A. Girschick and the present author) the non-central Wishart dis¬ 
tribution is given explicitly for the rank one and two cases and indicated for the case of any 
rank. In the present paper the characteristic function of the non-central Wishart distribu¬ 
tion is given for general rank. The distribution, which is given in the form of a multiple 
integral, is the product of a central Wishart distribution and a symmetric function of the 
roots of a determinantal equation involving the matrix of squares and cross products of 
observations and the matrix of population means. It is shown that the convolution of two 
non-central Wishart distributions is again a non-central Wishart distribution if the vari¬ 
ance-covariance matrices are the same. The moments of the generalized variance and the 
moments of the likelihood ratio criterion for testing certain linear hypotheses (for example, 
the hypothesis that the means of a set of populations are identical, given that the matrices 
of population variances and covariances are the same) are obtained for the linear and planar 
non-central cases in terms of infinite series. Likelihood ratio criteria arc developed for 
testing the dimensionality of the means of a set of multivariate populations (with identical 
variances and covariances) on the basis of one sample from each. The criterion for testing 
whether the dimensionality is h in the space of p dimensions is a symmetric function of p — h 
smallest roots of the determinantal equation involving the sample estimate of the matrix 
of variances and covariances and the sums of squares and cross-products of deviations of 
sample means. The maximum likelihood estimate of the hyperplanes and positions of 
means on them are obtained. The asymptotic distributions of the criteria are x a - 
distributions. 

3. The Effect on a Distribution Function of Small Changes in the Population 
Function. Burton H. Camp, Wesleyan University. 

It is generally assumed in the application of distribution theory that, if the actual popu¬ 
lation function is not very different Trom the one used in the theory, then the true sampling 
distribution of a statistic will not be very different from the one obtained in the theory. 
But elsewhere in mathematics we do not assert that a conclusion will be only slightly modi¬ 
fied by a small deviation in the hypothesis. This paper presents some theorems which are 
useful in determining the maximum effect on a sampling distribution of certain kinds of 
small changes in the population function. 
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4. Composite Distributions. Casper Goffman and Benjamin Epstein, Westr 
inghouse Electric Corporation. 

Let/(*; , $%, • • • , B % ) be a function such that for every point 0i «■ , • - • , «• in 

parameter space, £ is a random variable with p.d.f. f(x; 0 IO , • • • , 0«o). Suppose further 
that the parameters $i , $%, * • • , 0* are themselves random variables whose p.d.f.'s are 
given respectively by £(0i) , • • • 4>(0 n ). Using a concept of “probability contained in an 
interval 11 and an axiom based on this concept, we Bhow that £ is a random variable with 
p.d.f. g(x) given by the formula 

(1) g(x) - f • • • f f(x; • • • , 0n) 4>(0 1 ) • • • 4>(0*) d$i • • • d6n . 

In this paper we consider statistical properties of the function g(x) in cases of particular 
interest in applications. The cases treated here are (a) where the mean, £, is the only vari¬ 
able parameter, (b) where the standard deviation, <r, is the only variable parameter, and 
(c) where the mean £ f and the standard deviation, a , are both variable parameters; £ and a 
being independent. 

It is shown that problems (a) and (b) are equivalent respectively to the sum and product 
of two independent random variables, one of which has zero mean. Formulae for the 
moments in problem (c) are then derived in terms of the formulae obtained for (a) and (b). 

5. Population, Expected Values and Sample. E. J. Gumbel, New School for 
Social Research. 

Let x be an unlimited continuous variate, and let F(x) be the probability of a value equal 
to, or less than, x. Then the expected m th values £ m , for n observations, are approxima¬ 
tions to the most probable m tb values and defined by F(£ m ) * F\ 4* (F H — F j) (m — 1)/ 
(ft — 1), where F\ and F n are the probabilities of the most probable first and the most prob¬ 
able laBt value. The probabilities F\ , 1 — F n and (F n — Fi)/(n — 1) are of the order of 
magnitude 1/n. 

The distribution of the expected values x m differs from the distribution of the sample 
and from the theoretical distribution. However, for a symmetrical distribution the mean 
and the odd moments about mean calculated from the expected values coincide with the 
mean and the moments of the population. For the normal.distribution, the expected 
standard deviation <r(n) divided by the standard deviation a of t he pop ulation and traced 
on normal probability paper approximates a linear fuflction of Vlog w. The approach of 
<r(ft) toward e is slow. For 500 observations, <r(n ) is about 99% of a. The moments of the 
distribution of the expected values exist even in the case that the moments of the theoretical 
distribution diverge. 

6. On Optimum Estimates for Stratified Samples. Morris H. Hansen and 
William N. Hitrwitz, Bureau of the Census. 

A stratified sample is drawn from a population with R strata. Neyman found the op¬ 
timum sample allocation for the “best unbiased linear estimate.” However, biased but 

consistent estimates of the form —, where both x\ and y\ are random variables have been 

Vi 

found to give more reliable results in a large class of problems. Even more efficient esti¬ 
mates can be obtained by finding the values of «< (the sample size) and w < which minimize 

x[ XWix'i 

the mean square error of estimates of the form Xwi — or-,. 

Vi 2U). y[ 
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7. Pearsonian Correlation Coefficients Associated with Least Squares Theory. 

Paul S. Dwyer, University of Michigan. (Read by Title); 

In least squares theory we have the predicting variable x, the observed value of the 
predicted variable, y , the residual e, and the predicted value of the predicted variable y. 
The purpose of this paper is to study the Pearsonian coefficients resulting from correlating 
all these variables in pairs (a) in the case of a single predicted variable and (b) in the case 
of two or more predicted variables. The results yield such coefficients as multiple correla¬ 
tion, multiple alienation, partial correlation, part correlation, and new coefficients not 
previously in use. The results are given in expanded, determinant, and matrix form. A 
simplified calculational technique is provided. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute new items of interest 

Personal Items 

Dr. Kenneth Arnold, recently with the Statistical Research Group, Columbia 
University, has accepted an assistant professorship in Mathematics at the Uni¬ 
versity of Wisconsin. 

Dr. Leo Aroian has returned to his position at Hunter College after serving as 
Research Associate in the Applied Mathematics Panel Project at the University 
of California. 

Mr. Geoffry Beall is now statistician for the Institute of Paper Chemistry at 
Appleton, Wisconsin. 

Mr. Robert E. Breden has accepted a position with the Personnel Research 
Department of Proctor and Gamble at Cincinnati. 

Mr. William F. Elkin, who has been Social Science Analyst with the Vital 
Statistics Division of the Bureau of the Census, has accepted a position as Vital 
Statistician at Oak Ridge, Tenn. 

Mr. Robert M. Ewing of the IT. S. Rubber Company has been transferred to 
Detroit. He now serves in the capacity of Tire Development Engineer. 

Dr. A. S. Householder, formerly of the University of Chicago, is now with the 
Fire Control Division of the Naval Research Laboratory in Washington. 

Dr. Irving Kaplansky has been appointed to an assistant professorship of 
mathematics at the University of Chicago. 

Mr. Amrom H. Katz has been promoted from Associate Physicist to Physicist 
at the Aerial Photographic Laboratory at Wright Field. 

Dr. William G. Madow of the Bureau of the Census will serve as Visiting 
Professor of Statistics at the University of Sao Paulo, Brazil, for the full academic 
year which begins on March 16. He expects to return to the United States in 
January of 1947. 

Dr. J. E. Morton, formerly of Knox College, has joined the staff of the National 
Bureau of Economic Research. 

Dr. A. C. Olshen has returned from his navy work in Washington to his position 
as Actuary and (Thief Examiner of the Oregon Insurance Department at Salem, 
Oregon. 

Mr. Joseph S. Rhodes (formerly Joseph Rosenthal) now holds the position of 
Sampling Specialist in the Bureau of the Census. 

Prof. Paul R. Rider, on leave from Washington University, is teaching at 
Shrivenham American LTniversity in England. 

Dr. J. Wolfowitz has accepted an associate professorship in Statistics at North 
Carolina State College. Professor Wolfowitz is serving as Associate Editor of 
the Journal of the American Statistical Association. 
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New Members 

The following persons have been elected to membership in the Institute: 

Astrachan, Asso. Prof. Max, Ph.D. (Brown) Antioch College, Yellow Springs, Ohio. 

Bales, R. P., B.A. (Toronto) Tech. Sup., Dominion Rubber Co., St. Jerome, Que. Can. 

Bafios, Olegario Fernandez, D.C. (Madrid) Catedratico, Univ. of Madrid, Calle Lopez 
de Hoyos 7, Spain. 

Barkan, Herbert, M.A. (Columbia) Econ. Analyst, 60 8th Ave., Brooklyn, N. Y. 

Bloom, Royal F., M.A. (Minnesota) Lt. Comdr. USNR, Test and Res. Section, Bureau of 
Naval Personnel, 61 C. Ridge Road , Greenbelt, Md. 

Blommers, Paul J., Ph.D. (Iowa) Univ. Examinar and Registrar, 114 Univ. Hall, State 
Univ. of Iowa, Iowa City, Iowa. 

Brier, Glenn, A.M. (George Washington) Meteorologist, US Weather Bureau, Washington 
26, D. C. 

Brlxey, Nancy, B.A. (Vassar) Economists, Davis and Gilbert Law Firm, 1 E. 44 St., 70 
East 77 St. New York 21, N. Y. 

Caplan, Benjamin, Ph.D. (Chicago) Econ., OPA, 2831 28th St. N.W., Washington 8, D. C. 

Chassan, Jack, B.S. (C.C.N.Y.) Stat., Office of Stat. Control, Hdq. A.A.F. 3013 30th St. 
S.E., Washington, D. C. 

Cornell, Dr. F. G., Ph.D. (Columbia) U. S. Office of Ed., Tempo. M. 26th and Water, N.W., 
Washington, D. C. 

Cram6r, Prof. Harald, Ph.D. (Stockholm) Skarviksv&gen 7, Dursholm, Sweden. 

Dempsey, William B„ Ph.D. (Harvard) Regent of the School of Commerce and Finance, 
Saint Louis Univ., 8674 Lindell Blvd., St. Louis 8, Mo. 

Derrick, Asst. Prof. Lucile, M.A. (Peabody) Univ. of Chicago, School of Business, 5642 
Kimbark, Chicago 87, III. 

Dominguez, Emilia A., Ec.S. (Buenos Aires) Actuary, Supt. Personas Juridicas de Buenos 
Aires, Martinez Castro 765, Buenos Aires, Argentina. 

Dominguez, Jose F., Ec. S. (Buenos Aires) Tech. Council Instituto Nacional de Prevision 
Social, Martinez Castro 765, Buenos Aires, Argentina 

Duncan, Asst. Prof. Acheson, Ph.D. (Princeton) Econ. Dept., Princeton Univ., Princeton, 
N. J. 

Dyson, John D., B.S. (South Dakota State) Major, U. S. Army, Fitzsimons Gen. Hosp., 
Denver, 108 S. Jefferson, Pierre, So. Dak. 

Elmore, Francis B., B.S. (Clemson) Capt., Ord. Dept. Inspection of Ammunition, 505 
Kingston Drive, St. Louis 28, Mo. 

Franzen, Raymond, Ph.D. (Columbia) Stat. Consultant, 10 Rockefeller Plaza, New York 
20, N. Y. 

Friedman, Bernard, Ph.D. (Mass. Inst. Tech.) Res. Math., A.M.P., N.Y.U., 8741 81 St. 
Jackson Heights, N. Y. 

Gordon, J. J., Staff Stat. Eng. Quality Control, Western Electric Company, Inc., 100 Central 
Ave., Kearny, New Jersey. 

Gough, Elsie L., M.A. (Michigan) Auditing Clerk, 648 Blvd. Way, Oakland 10, Calif. 

Greene, Kenneth E., B.S. (Yale) Asst. Res. Mgr., Nat. Broadcasting, 4784 Post Road , 
Pelham Manor 65, N. Y. 

Haskins, Asso. Prof. Elmer E., Ph.D. (Boston) Northeastern Univ., Boston, 68 Damien 
Rd., Wellesley Hills 82, Mass. 

Humes, Helen M„ M.A. (Pittsburgh) Price Econ., Bureau of Labor Stat., U. S. Dept, of 
Labor, 8708 84th St., N.W., Washington 8, D. C. 

Jackson, Irwin B., M.A. Mec. Eng. (Pennsylvania) Lt. } Cadet Ground School Inst., Box 
163, Tuskegee Army Air Field, Tuskegee, Ala. 

Jarrett, Rheem F., B.A. (Arizona) Lecturer in Psych., Dept. Psych., Univ. of Calif., 
Berkeley 4, Calif. 
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Johnsen, Madeline, A.M. (Stanford) 8449 Hth Ave., San Francisco, Calif. 

King, Frederick G., B.A. (Harvard) Capt. U. S. Army, 1629 Que St., N.W., Washington, 
D. C. 

Leipnik, Hoy B., B.S. (Chicago) Res. Asst., Cowles Comm, for Res. in Economics, 6687 S. 
Kenwood, Chicago 87, III. 

Lesser, Grace L., B.A. (Hunter) Asst. Math., Applied Math. Group, Columbia Univ., 
1576 Unionport Rd. f Bronx 68, N. Y. 

deLoor, Prof. Barend, Ph.D. (Amsterdam) Univ. of Pretoria, Pretoria Union of South 
Africa. 

MacNeish, Harris F., Ph.D. (Chicago) Chairman Math. Dept., Brooklyn College, Bedford 
Ave. & Ave. H, Brooklyn, N. Y. 

Maddrill, James D., Ph.D. (California) Math. Res. and Dev. Ballistic Res. Lab., Aber¬ 
deen Proving Ground, Md. 

Madow, Lilian H., M.A. (American) 1446 Ogden Street, N.W., Washington 10, D. C. 

Martian, Dixon M., M.A. (Coluihbia) Instr. Math., U. S. Military Academy, West Point, 

n. y: 

Martin, Charles C„ B.S. (U. S. Military Academy) Lt., Ordnance Dept. U. S. Army, Box 
363, Hot Springs, New Mexico. 

Monderer, Phyllis, B.A. (Hunter) Asst. Math., Applied Math. Group, Columbia Univ. 
Div. War Res., New York, N. Y., 589 West 170 Street, New York 88, N. Y. 

Moore, Margaret W., B.A. (Wilson) Stat., P-3, War Dept., LeHerKenny Ordnance Depot, 
Chambersburg, Pa., 804 Lincoln Way West, Chambersburg, Pa. 

Pope, Otis, Ph.D. (Iowa State) Senior Biometrician, USDA, Tech. Collaboration Branch, 
Washington, D. C. 

Priestley, Alice E., M.A. (New York) Instr. Stat. and Math., Wilson College, Chambers¬ 
burg, Pa. 

Rafferty, J. Allan, B.S. (Harvard) Medical Student, Pfc., ASTP (AUS) Box 236, Rochester 
Med. School, Rochester 7, N. Y. 

Randall, Robert J., B.S. (Yale) Lt., Post Weight and Balance Officer, Tuskegee Army Air 
Field, Tuskegee, Ala. 

Reiner, Mae, B.A. (Hunter) Asst. Math., Applied Math. Group, Columbia Univ., Die. 
of War Res., 170 Second Avenue, New York 8, N. Y. 

Rodal, Prof. Juan A., Ph.D. (Buenos Aires) Univ. of Buenos Aires, Aviles 8755 , Buenos 
Aires, Argentina. 

Rubin, Herman, S.M. (Chicago) 7148 East End Ave., Chicago 49, III. 

Schmalz, W. H., B.A. (Toronto) Tech. Supt., Merchants Factory Dominion Rubber Co., 
Kitchener, Ont., Canada. 

Simmons, Willard R., M.A. (Duke) Head of Stat. Section, Food and Automotive Ration¬ 
ing, Div., OPA, 1480 Saratoga Ave., N.E ., Washington, D. C. 

Sobel, Milton, B.S. (C.C.N.Y.) 88 Elliot Place, The Bronx, N. Y. 

Stauber, B. R., M.A. (Minnesota) Chief, Relocation Planning Div., War Relocation 
Authority, U. S. Dept, of the Interior, 9701 Bexhill Drive, Kensington, Maryland. 

Steen, Jerome R., B.S. (Wisconsin) Mgr., Quality Control Eng., Sylvania Electric Prod¬ 
ucts, Inc., Emporium, Pa. 

Sullivan, John W., Sc.D. (Mass. Inst. Tech.) Metallurgist, American Iron and Steel In¬ 
stitute, 360 Fifth Ave., New York 1, N. Y. 

Trowbridge, Frederick, Quality Control Eng., Sentinel Radio Corp., 2020 Ridge Ave., Evan* 
ston, Ill. 

Week, Frank A., B.A. (Stanford) Capt., MAC, AUS, Chief, Stat. Analysis Branch, Med¬ 
ical Stat. Div., Office of the Surgeon General, 1818 H St., N.W., Washington 26, D. C. 

Weiss, Samuel, M.A. (Michigan) Chief, Manpower Estimates Section, War Manpower 
Comm., 8078 S. Buchanan , Arlington, Virginia. , 

Wold, Prof, Herman O., Ph.D. (Stockholm) Univ. of Uppsala, Stat. Inst., Odinslund 2, 
Uppsala, Sweden. 



408 


NEWS AND NOTICES 


Announcement of the St Louis Meeting of the Institute 

The Institute of Mathematical Statistics will hold a joint meeting with 
Section A (Mathematics) of the American Association for the Advancement of 
Science on Saturday, March 30 at 2 P.M. in St. Louis. All the details are 
not yet available but the session will feature (1) contributed papers on Statis¬ 
tics and Probability, (2) an address by Lt. Commander John H. Curtiss on the 
topic Statistical Inference and its Engineering Applications, and (3) an address 
by Mr. Morris H. Hansen on Sampling Problems in Surveys of Business and 
Population. 


Meeting of Washington Chapter 

A joint regional meeting of the Washington Chapter of the Institute and the 
Washington Chapter of the American Statistical Association is being planned 
for April 12-13, 1946. 



MEMBERS OF THE INSTITUTE OF MATHEMATICAL 
STATISTICS* 

(As of November 16, 1946) 

(The names of Fellows of the Institute are designated by * and Life Life Members byf) 

Abbey, Helen M.A. (Michigan) Stat., Bur. of Records and Stat., Mich. Dept, of Health, 
916 N. Chestnut , Lansing , Mich. ' 

Acerboni, Prof. Argentino V. Dr. Be. (Buenos Aires) Facultad de C. Economicas, Buenos 
Aires, Argentina, Larroque 238, Banfield, Argentina 
Acton, Forman Ch.E. (Princeton) T/4, Army of US, Corps of Engineers, S.E.D. , Bar¬ 
racks Area, Oak Ridge, Tenn. f 

Aitchison, Beatrice Ph.D. (Johns Hopkins) Econ. and Stat. Analyst, Interst. Com¬ 
merce Comm., Washington 25, D. C. 1929 S St., N.W., Washington 9, D. C. 

Allen, Prof. Roy G. D.Sc. (London) London School of Econ., Houghton St., Aldwych, 
London, W.C. 2. 

Allendoerfer, Asso. Prof. Carl B. Ph.D. (Princeton) Haverford College, Haverford, Pa. 
Alt, Franz L. Ph.D. US Army, 271 Fort Washington Ave., New York City 82 
Alter, Dlnsmore Ph.D. (California) Res. Asso. in Math. Theory of Stat., Calif. Inst, of 
Tech., Dir. Griffith Observatory, Los Angeles, Calif., Col. T. C., US Army, 211 Pier 2, 
Brooklyn Army Base, Brooklyn , N. Y. 

Anderson, Paul H. Ph.D. (Illinois) Econ. Analyst, Office of Surplus Property, Dept, of 
Commerce, Washington, D. C. 1228 Blair Mill Rd ., Silver Spring , Md. 

Anderson, Asso. Prof. Richard L. Ph.D. (Iowa State) Res. Math., Inst, of Stat., N. C. 
State College, Raleigh, N. C. 

Anderson, Theodore W., Jr. Ph.D. (Princeton) Res. Math., Cowles Commission for 
Res. in Econ., Univ. of Chicago, Chicago 37, Ill. 

Andrews, Asst. Prof. T. Gaylord Ph.D. (Nebraska) Univ. of Chicago, Chicago, HI. 
Angell, Dorothy T. Stat. Analyst, Bell Tel. Labs., Murray Hill, N. J. 

Arias, B., Jorge C.E. (Guatemala) 3 Avenida Sur 65, Guatemala City, Guatemala, 
Central America 

Arnold, Asso. Prof. Herbert E. Ph.D. (Yale) Wesleyan Univ., Middletown, Conn. 
Arnold, Asst. Prof. Kenneth J. Ph.D. (Mass. Inst. Tech.) Univ. of Wisconsin, Madison 
6, Wis. North Hall 

Aroian, Leo A. Ph.D. (Michigan) Instr. Hunter Coll., New York City. 247 Wadsworth 
Ave., New York City 88 

Arrow, Kenneth J. M.A. (Columbia) Lydig Fellow, Columbia Univ., 116th St. and 
Broadway, New York City, Capt. AC, Hq. AAF, Weather Service, Asheville, N. C. 
218 South French Broad Avenue, Asheville 

* Members were asked to supply fresh information for this Directory. Records may be 
inexact or incomplete (1) because of the failure of some member to comply with this request, 

(2) because the directory card became obsolete as a result of an unreported change of address, 

(3) because information about position did not accompany a notice of change of address, or 

(4) because it is impossible to give all the information about men on leave in the standard 
form of “position,” “address,” and (in italics) “home or mail address.” Some members 
on leave or in the services have reported the permanent address. Some have reported the 
“on leave” or “APO” address, as the mailing address. The addresses given are the last 
reported addresses. When an address is known to be in error, it is followed by (last address). 
Changes in addresses or errors in names, titles or addresses, should be reported to 
the Secretary. 
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Astrachan, Asso. Prof. Max Ph.D. (Brown) Antioch College, Yellow Springs, Ohio 

Attner, George B.A. (Western Reserve) H418'Iowa Ave ., Cleveland 8, Ohio 

Bachelor, Robert W. M.B.A. (Washington) American Bankers Assn., 22 East 40th St., 
New York City 16 

Bacon, Asso. Prof. Harold M. Ph.D. (Stanford) Stanford Univ., Stanford, Calif. Box 
1114 

Bailey, Arthur L. B.S. (Michigan) Stat., American Mutual Alliance, 60 E. 42nd St., 
New York City, P. 0. Box 878, Ramsey, N. J. 

"Baker, Asst. Prof. George A. Ph.D. (Illinois) Asst. Prof, of Math, and Asst. Stat., 
Exp. Sta., Coll, of Agri., Univ. of California, Davis, Calif. 

Baldwin, Woodson W. S.B. (Mass. Inst. Tech.) Capt., Ord. Dept., USA Office of Field 
Dir. of Ammunition Plants, 3629 Lindeli Blvd., St. Louis 8, Mo. 3745 Lindell Blvd. 

Bales, R. P. B.A.Sc. (Toronto) Tech. Supt., Dominion Rubber Co., St. Jerome, Que. 
Canada 

Bancroft, Asst. Prof. Theodore A. Ph.D. (Iowa State) Iowa State Coll., Math. Dept., 
Ames, Iowa 

Bafios, Olegario Fernandez D.C. (Madrid) Catedratico, Univ. of Madrid, Calle Lopez- 
delloyos 7, Spain 

Barkan, Hebert M.A. (Columbia) Econ. Analyst, 60 8th Ave., Brooklyn, N. Y. 

Barnes, Jarvis M.A. (George Peabody Coll, for Teachers) Atlanta Board of Educ. 
14th Floor, City Hall, Atlanta, Ga. 

Barnes, Prof. John L. Ph.D. (Princeton) Chairman, Dept, of Applied Math., Tufts 
Coll., Medford 55, Mass., 16 Ardley Road , Winchester 

Barr, Prof. Arvil S. Ph.D. (Wisconsin) Univ. of Wisconsin, Madison, Wis. 

Barral-Souto, Prof. Jose Sc.D. (Buenos Aires) Univ. of Buenos Aires, Buenos Aires, 
Argentina, Cordoba 1459 

"Bartky, Asso. Dean Walter Ph.D. (Chicago) Univ. of Chicago, Chicago, Ill. 

Bartlett, Maurice D.Sc. (London) Univ. Lecturer, Cambridge, 137 Chesterton Road, 
Cambridge, Eng. 

Bassford, Horace R. B.A. (Trinity) Vice Pres, and Actuary, Metropolitan Life Ins. Co., 
1 Madison Ave., New York City 10 

*Baten, Prof. William D. Ph.D. (Michigan) Prof, of Math. Mich. State Coll, and Res. 
Prof. Mich. Agri. Exp. Sta., Mich. State Coll., E. Lansing, Mich. 411 Marshall St. 

Bates, Prof. O. Kenneth Sc.D. (Mass. Inst. Tech.) Prof, of Math, and Head of Dept., 
The St. Lawrence Univ., Canton, N. Y. 

Battln, Asst. Prof. Isaac L. A.M. (Swarthmore) Drew Univ., Madison, N. J. 14 Glen- 
wild Rd. 

Beall, Geoffrey Ph.D. (London) Res. Asso., Inst, of Paper Chemistry, Appleton, Wis. 

Bechhofer, Robert E. B.A. (Columbia) Stat., The Kellex Corp., 233 Broadway, New 
York City. 181 Degraw Avenue, Teaneck, N. J. 

Becker, Harold W. Elec. Inst., Mare Is. Training School, Bldg. 146, Mare Island, Calif. 
I486 Amador, Vallejo 

Beckstead, Gordon L. Lt.(j.g.) USNR, Weather Central NAS, San Diego, Calif. 

Beebe, Gilbert W. Ph.D. (Columbia) Lt. Sn C., AUS Control Div., Office of the Sur¬ 
geon General, 1818 H St., N.W., Washington, D. C. 

Been, Richard O. M.A. (George Washington) Sr. Agri. Econ., US Bur. of Agri. Econ., 
3433 South Bldg., Washington, D. C. 

Belliaon, Harold R. S.M. (Mass. Inst. Tech.) Industrial Eng., War Dept., Ord. Dept., 
Pentagon Bldg., Arlington, Va. 8416 B St., S.E., Washington 19, D. C. 

Belz, Asso. Prof. Maurice H. M.A. (Melbourne) Univ. of Melbourne, Carlton, N. 3, 
Victoria, Australia 

Bennett, Prof. Albert A. Ph.D. (Princeton) Brown Univ., Providence, R. I. 



MEMBERS OF THE INSTITUTE 


411 


Bennett, Blair M. M.A. (Columbia) As®o. Math, Nat. Bur. of Standards* Washington, 
D. C. 1410 M St., N.W. 

Bennett, Carl A. M.A. (Michigan) Chera., Slinton Engineer Works, Tenn. Eastman 
Corp., Knoxville 5, Tenn. 897 West Tenn., Oak Ridge 

Berger, Richard M.A. (Columbia) Asso. Stat., Office of Price Admin., Washington, 
D. C. Lt.(j.g.) USNR Communication Officer, USS Gainard (00700) c/o FPO Saq 
Francisco, Calif. 85 Rugby Road, Rockville Centre, N. Y. 

Berkson, Joseph D.Sc. (Johns Hopkins) Col. M.C., US Army, AAF, Office of the Air 
Surgeon, Washington 26, D. C. 

Berman, Abraham J. M.A. (Brooklyn) Stat., N. Y. State Dept, of Labor, 80 Center St., 
New York City, 1460 College Ave., Bronx , N. Y. 

Berwick, Leo A.B. (New York) Capt., A.C. Asst, to Surgeon Stat. Unit of Psych. 
Sect., Hdq. AFTRC, T & P Bldg., Fort Worth 2, Texas 

Bickerstaff, Asst. Prof. Thomas A. M.A. (Mississippi) Univ. of Mississippi, State 
College, Miss. 

Bigelow, Julian H. 401 W. 118th St., New York City 87 

Bimbaum, Asst. Prof. Z. William Ph.D. (Lwow) Univ. of Washington, Seattle, Wash. 

Blackadar, Walter L. B.A. (McMaster) Asso. Actuary, Equitable Life Assurance So¬ 
ciety of the US, 893 7th Ave., New York City 1 

Blackburn, Asso. Prof. Raymond F. Ph.D. (Pittsburgh) Head, Dept, of Stat., Univ. of 
Pittsburgh, Pittsburgh 13, Pa. 

Blackwell, Asst. Prof. David Ph.D. (Illinois) Math. Dept. Howard Univ., Washington, 

D. C. 

Blake, Archie Ph.D. (Chicago) Ballistic Res. Lab., Aberdeen Proving Gd. Box 86, 
Aberdeen, Md. 

Blanche, Ernest E. Ph.D. (Illinois) Foreign Econ. Admin., 515-22nd St., N.W., Wash¬ 
ington, D. C., 9409 Montgomery Ave., N. Chevy Chase, Md., APO 24741 c/o Postmas¬ 
ter, New York City 

*Bliss, Asso. Prof. Chester I. Ph.D. (Columbia) Biometrician, Conn. Agri. Exp. Sta., 
Lecturer in Biometry, Yale Univ., New Haven, Conn. 

Bloom, Rose B.A. (Hunter) 1275 SCSU, Fort Jay, N. Y. . 

Bloom, Royal F. M.A. (Minnesota) Lt. Comdr., 4717 Arlington Annex Navy Dept., 
Washington, D. C. 

Blommers, Paul J. Ph.D. (Iowa) Univ. Examiner and Registrar, 114 Univ. Hall, State 
Univ. of Iowa, Iowa City, Iowa 

Boddie, John B., Jr. Chief: Budget Formulation, Foreign Econ. Admin., Washington, 
D. C. 8638 Tunlaw Rd ., N.W. 

Bonis, Austin J. B.S. (C.C.N.Y.) Major, G-I War Dept. Gen. Staff, Washington, D. C. 
8500 Que St., N.W. 

Bonnar, Robert U. M.S. (Washington) 819 Jefferson St., Vallejo , Cali}. 

Boozer, Mary E. A.M. (Chicago) Stat. Res., Virginia State Planning Dc., 301 Finance 
Bldg., Richmond 19, Va. 

Borland, James M.A. (Indiana) Capt., Ex. Officer, Inspection Office, Pine Bluff Ar¬ 
senal, Ark. 

fBowen, Earl K. M.A. (Boston) Instr. in Math., Northeastern Univ., 360 Huntington 
Ave., Boston, Mass. 846 Union St., Norwood 

Boschan, Paul Ph.D. (Vienna) Econ. Inst., 500 Fifth Ave., New York City. W4 W. 
40th St., New York City 18 

Bower, Oliver K. Ph.D. (Illinois) Associate, Univ. of Ill., Urbana, Ill. 605 W . John 
Champaign 

fBowker, Albert H. S.B. (Mass. Inst. Tech.) Student, Columbia Univ., New York City 
27, 88 Arden Place, Yonkers 3, N. Y. 
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Brody, Dorothy S. Ph.D. (California) Home Ec. Specialist, Bur. of Home Econ., Wash¬ 
ington, D. C. 6678 Fulton St., N.W. 

Brandt, Alva E. Ph.D. (Iowa State) Chief, Erosion Control Practices Div., Soil Conser¬ 
vation Service, USDA, US Dept, of Agri., Washington, D. C., Box 185, Route 8 % 
Vienna , Va. 

Brearty, Charles R. B.S. (California) Major, US Army, Signal Corp. Inspection Agency, 
12th Floor, Public Ledger Bldg., 0th and Chestnut Sts., Philadelphia 6, Pa. 

Breden, Robert E. B.S. (Kansas) Personnel Tech., Personnel Res. Dept., The Proctor 
& Gamble Co., 6th and Main Sts., Cincinnati, Ohio 

Bridger, Clyde A. M.S. (Oregon) Inst. Math., Univ. of Utah, Salt Lake City 1, Utah, 
886 Douglas Street, Salt Lake City 2 

Brier, Glenn A.M. (George Washington) Meteorologist, US Weather Bur., Washington 
26, D. C. 

Brixey, Asso. Prof. John C. Ph.D. (Chicago) Univ. of Oklahoma, Norman, Okla. 927 
S. Pickard St., Norman 

Brixey, Nancy A.B. (Vassar) Economists, Davis and Gilbert, Law Firm, 1 E. 44 St. 
70 East 77 St., New York City 21 

Bronfenbrenner, Martin Ph.D. (Chicago) Lt. (j.g.) USNR, Office of CINCPAC, c/o 
Postmaster, San Francisco, Calif. 728 N. First Ave., Tucson , Am, 

Brookner, Ralph J. Ph.D. (Columbia) Lt., USNR Navy Board, Washington, D. C. 
90 Riverside Drive, L., New York City 24 

Brooks, Alvin G. B.A. (Ripon) Chief of Inspection Tasks Sect., Western Electric Co., 
Hawthorne Sta., Chicago, Ill. 4538 Lawn Ave., Western Springs 

Brown, Asst. Prof. Arthur B. Ph.D. (Harvard) Queens Coll., Flushing, N. Y. 

Brown, Arthur W. A.B. (Princeton) Res. Asso., Columbia Univ., Div. of War Res., 
New York City, Columbia Res. Group M, Room 4311, COMINCH, Navy Dept., 
Washington, D. C. 

Brown, George W. Ph.D. (Princeton) Res. Eng., RCA Labs., Princeton, N. J. 

fBrown, Richard H. A.B. (Columbia) Asso. Math., Navy Dept., Bur. Ord., Washington, 
D. C. Rm 816-1 8415 88th St., Washington 16, D. C. 

Brown, Prof. Theo. H. Ph.D. (Yale) Bus. Stat. Harvard Bus. School, Soldier’s Field, 
Boston 63, Mass. 

Brumbaugh, Prof. Martin A. Ph.D. (Pennsylvania) IJniv. of Buffalo, Crosby Hall, 
Buffalo 14, N. Y. 

Bruner, Nancy M.A. (Iowa) Stat., Western Auto Supply Company, Kansas City 8, 
Mo. 7611 Main Street , Kansas City 5 

Bruyere, Martha M.D. (Chicago) Stat. US Public Health Service, Bldg. Tg, Bethesda, 
Md. R.F.D. Route $1, Gaithersburg 

Bruyere, Paul T. M.P.H. (Yale) Stat. US Public Health Service, Bldg. T6, Bethesda, 
Md. R.F.D. Route HI, Gaithersburg 

Bryan, Joseph Ed. M. (Harvard) Mass. Inst, of Tech., Cambridge, Mass. Apt. 608, 
1010 26th St., N.W., Washington, D. C. 

Budne, Thomas A. M.A. (N. J. State Teachers Coll.) Inst, of Math., N. J. State Teach¬ 
ers Coll., Upper Montclair, N. J. 2088 76th St., Brooklyn 14, N. Y. 

Bunke, Alfred M.A. (Columbia) Sen. Stat., N. Y. State Dept, of Labor, 87 Parkwood 
St., Albany 8, N. Y. 

Burgess, Robert W. Ph.D. (Cornell) Chief Econ., Western Electric Co., 196 Broadway, 
New York City 7 

Burington, Asso. Prof. Richard S. Ph.D. (Ohio) On leave from Case School of Applied 
Science, Cleveland, Ohio, Head Math., Bur. ord. USN, 5200 N. Carlin Spring Rd. t 
Arlington , Va. 

Burk, Marjorie B.A. (Hunter) Stat., Weather Service, Hdq. AAF, Washington, D. C. 
1912 Third Street, N.E. 
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Buros, Oscar K. M.A. (Columbia) Major, Signal Corps, Chief, Standards Sect., School 
Div., AAF, Washington, D. C. SOI S. Courthouse Rd., Arlington ,. Va. 

Burr, Asso. Prof. Irving W. Ph.D. (Michigan) Purdue Univ., W. Lafayette, Ind. 

Busliey, Asso. Prof. J. Hobart Ph.D. (Michigan) Hunter Coll., 695 Park Ave,, New 
York City 21 

*Camp, Prof. Burton H. Ph.D. (Yale) Wesleyan Univ., Middletown, Conn. 110 Jf|. 
Vernon St. 

Campbell, Asso. Prof. Frances L. Ph.D. (Michigan) Geo. Pepperdine Coll., 1121 W. 
79th St. f Los Angeles, Calif. 

Campbell, George C. M.S. (Iowa) Supervisor, Metropolitan Life Ins. Co., 1 Madison 
Ave., New York City 10. Troy Road R.F.D. HI, Boonton , N. J. 
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